Comparative Analysis of HUI-PR and EFIM for High Utility Itemset Discovery

Categories: Technology

Abstract

This lab report presents the results of experiments conducted to evaluate two different methods, HUI-PR and EFIM, for discovering high utility itemsets using various real datasets. The computational time, the number of high utility itemsets (HUIs) found, and the quantity of candidate sets generated were compared to assess the performance of these methods.

1. Introduction

In this experiment, we compared the HUI-PR method with the EFIM method on multiple datasets. The goal was to determine which method is more efficient in discovering high utility itemsets.

The experiments were conducted on different datasets with varying characteristics, including dataset size, density, and itemset types.

2. Methodology

2.1 Datasets

The experiments were performed on a range of real datasets, including Chess, Connect, Retail, Connect2x, Chess30x, BMS4x, and Mushroom20x. These datasets were chosen to represent both small and large datasets with different characteristics. Table 4.1 provides detailed characteristics of these datasets.

Dataset # of Transactions # of Items Avg Length Max Length Type Scale
Chess 3196 76 37 37 Dense Small
Connect 67557 129 43 43 Dense Small
Retail 88162 16470 10 76 Sparse Medium
Connect2x 135114 129 43 43 Dense Large
Chess30x 95880 76 37 37 Dense Large
BMS4x 238408 497 3 267 Sparse Large
Mushroom20x 162400 119 23 23 Dense Large

3. Results

3.1 HUI-PR versus EFIM

3.1.1 Comparison of Computational Time

The computational time of HUI-PR and EFIM was compared using the Connect dataset, Chess dataset, and Retail dataset. HUI-PR demonstrated significant improvements in computational time, primarily for datasets with a large number of transactions. HUI-PR effectively reduced the number of transactions at each level, utilizing a pruning hash table to eliminate unnecessary transactions. Figure 4.1 illustrates the comparison of computational time between HUI-PR and EFIM for different threshold ratios.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

For example, for the 'Connect' dataset with a threshold ratio of 28.90%, HUI-PR took 1830.87 seconds, while EFIM took 1927.95 seconds. Similarly, on a threshold ratio of 0.03% for the 'Retail' dataset, HUI-PR took 5718.36 seconds, while EFIM took 7370.33 seconds. These results demonstrate that HUI-PR outperforms EFIM in terms of computational efficiency.

3.1.2 Comparison of HUIs

The number of high utility itemsets (HUIs) found by HUI-PR and EFIM were compared using the Connect, Chess, and Retail datasets. The results showed that both methods discovered the same number of HUIs across different threshold ratios. Table 4.2 presents the number of HUIs found for various threshold ratios for each dataset. This indicates that HUI-PR is as effective as EFIM in discovering HUIs.

3.1.3 Comparison of Candidate Sets

We compared the candidate sets generated by HUI-PR and EFIM. HUI-PR produced fewer candidate sets compared to EFIM. The candidate sets were reduced in HUI-PR using transaction pruning techniques and a pruning hash table. Figure 4.3 illustrates the comparison of candidate sets between HUI-PR and EFIM, and Table 4.3 shows the total number of transaction pruned in HUI-PR for different datasets and threshold ratios. HUI-PR generated fewer candidate sets, demonstrating its efficiency in reducing unnecessary computations.

3.2 Comparison with State-of-the-Art Algorithms

We also compared HUI-PR with state-of-the-art algorithms, including HUI-Miner, HUP-Miner, FHM, FHM+, and d2HUP. The results indicated that HUI-PR outperforms these algorithms significantly. For instance, for the 'Connect' dataset, HUI-PR performed over 100 times better than HUI-Miner, HUP-Miner, and FHM, and nearly 50 times better than d2HUP. Similar performance improvements were observed for the 'Chess' dataset.

4. Discussion

The experimental results show that HUI-PR is a highly efficient method for discovering high utility itemsets compared to EFIM and other state-of-the-art algorithms. It reduces computational time, generates fewer candidate sets, and performs as well as EFIM in terms of HUI discovery. These findings make HUI-PR a promising algorithm for mining high utility itemsets in large and dense datasets.

5. Conclusion

In conclusion, the experimental results demonstrate the effectiveness of the HUI-PR method in discovering high utility itemsets. It outperforms EFIM and other state-of-the-art algorithms in terms of computational efficiency while achieving the same results in HUI discovery. HUI-PR has the potential to be a valuable tool in data mining applications that require the identification of high utility itemsets in various datasets.

Updated: Jan 17, 2024
Cite this page

Comparative Analysis of HUI-PR and EFIM for High Utility Itemset Discovery. (2024, Jan 17). Retrieved from https://studymoose.com/document/comparative-analysis-of-hui-pr-and-efim-for-high-utility-itemset-discovery

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment