Database and Data Mining Essay

Custom Student Mr. Teacher ENG 1001-04 20 August 2016

Database and Data Mining

I) For the first row, explain the “conf. %” output and how it is calculated. It includes the following interpretations:

The Confidence of rule # 2 is 60.19 %( Or it is marginally over 60 %.) Confidence shows the rate at which consequents will occur. In this case the consequents are Brushes and Concealer as the Rule goes “If Bronzer and Nail Polish, then Brushes and Concealer” In this we are telling how any times Brushes and Concealers appear in transactions that contain Bronzer and Nail Polish. It is calculated as follows.

Confidence = {transactions with antecedent and consequent items}/{transactions with antecedent items} According to the values in the matrix:

While support of all transactions with Brushes, Concealer, Bronzer, Nail Polish are 62 (support a U c) and support of the number of transactions that involved antecedents (Bronzer and Nail Polish) are 103. = Confidence = {transactions with antecedent and consequent items}/{transactions with antecedent items} = 62/103 = .6019417 = 60.19%

ii) For the first row, explain the “support (a)”, “support(c)” and “Support (a U C)” output and how it is calculated. Support is the percentage or number of occurrences of items in both antecedent and consequent item sets in a transaction.

In the case of the matrix :

Support ( a) is the number or the percentage of the occurrence of { Bronzer, Concealer, Brushes, Nail Polish}/ transaction = Support (a) = {Bronzes, Concealer, Brushes, Nail Polish}/ transaction

Support (c) = It is the number of occurrence of the item set in the consequent. = In this case it appears that Brushes, Concealers appeared in the consequent item 77 times. Support (a U c) = This is the support of the combined item set. Therefore it will be the Union of Support (a) = 103 and Support (c) =77, which is 62.

iii) For the first row explain the “Lift Ratio” and how it is calculated Lift Ratio is another way of testing or judging the strength of an association rule. It helps to know the effectiveness of the rule in finding the consequents. It is done by comparing the confidence of the rule with a benchmark confidence value. Benchmark confidence on the other hand is calculated in the following manner: Benchmark confidence = no. transactions with consequent item set/ no. transactions in database. Lift Ratio is the outcome of the comparison of Confidence to the Benchmark confidence. It is the confidence of the rule divided by the confidence, assuming independence of consequent from antecedent:

Lift Ratio = confidence / benchmark confidence.

NB It is possible to calculate the value of the benchmark confidence, if need be, as we have at this stage the values for Lift Ratio and Confidence.

iv) For the first row, explain the rule that is represented there in words. Rule # 2 = :If items Bronzer, Nail Polish are purchased, this implies items Brushes, Concealer are also purchased. This rule has a confidence of 60.19%

V) Using XLMiner, apply association Rules to the dataset on the cosmetics purchases.

VI) interpret the first three rules in the output in words. Rule # 1 = If item Brushes is purchased, this implies item Nail Polish is also purchased. This rule has a confidence of 100%. Rule # 2 = If item Nail Polish is purchased, this implies item Brushes is also purchased. This rule has a confidence of 53.22%. Rule # 3 = If item Blush is purchased, this implies item Concealer is also purchased. This rule has a confidence of 60.60%.

VII) Reviewing the first couple of dozen rules, comment on their redundancy and how you would assess their utility. By definition a rule is redundant with respect to another if it has at least the same confidence and support of the latter for every dataset. In the above list of rules, Rule #16 and Rule #17 meet this definition. They have the same confidence and support. We also face the same situation in Rules # 28 and 29, Rules #105, #106

In the case of Rule #2, we have the same redundancy situation without having the same confidence ration. In the first case our confidence is 100%, while it is only 53.21% in Rule #2.

What is their utility? We can tell that the rules complement each other and confirm the validity and practicality of the thoughts and theories surrounding the data mining task of Association Rules.

Free Database and Data Mining Essay Sample

B

  • Subject:

  • University/College: University of California

  • Type of paper: Thesis/Dissertation Chapter

  • Date: 20 August 2016

  • Words:

  • Pages:

We will write a custom essay sample on Database and Data Mining

for only $16.38 $12.9/page

your testimonials