Partial Weighted Count Tree for Discovery of Rare and Frequent Itemsets

Shwetha Rai1

Geetha M1, Email

Preetham Kumar2

Giridhar B1

1Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India
2Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India

Abstract

Time and space utilization for discovering interesting patterns from a database plays an important role in analyzing information for major sectors like education, medicine, and e-business. Association Rule Mining (ARM) technique is used to discover associations among the patterns from large volumes of data. In most ARM algorithms, rare and frequent itemsets discovery is optimized by mining pruned databases stored in the main memory. However, in this case, any change in requirements would necessitate re-scanning of the database. Weighted Count tree (WC-Tree), and Single Scan Pattern tree (SSP-Tree) store the database in the main memory without pruning. WC-Tree stores the entire transaction as a node in the tree. However, if the weight is large, the actual information may be lost due to the precision error. In the current work, an efficient data structure, Partial Weighted Count Tree (PWC-Tree), is proposed to store the database as a complete and compact structure in the main memory without losing the information. The work revealed that PWC-Tree construction is in O(n2) for n transactions in the database. The experimental results show that, for a large dataset, the PWC-Tree is time as well as space-efficient when compared with WC-Tree and SSP-Tree.

Partial Weighted Count Tree for Discovery of Rare and Frequent Itemsets