Binary Count Tree: An Efficient and Compact Structure for Mining Rare and Frequent Itemsets

Shwetha Rai,1 

Geetha M.,1,*Email

Preetham Kumar2 

Giridhar B.1

1Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India

2Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India

Abstract

The discovery of rare and frequent itemsets is done efficiently if the datasets to be processed are stored within the main memory. In recent years, various data structures have been developed to represent a large dataset in a compact form which otherwise cannot be stored as a whole within the main memory. BIN-Tree, a tree data structure is proposed in this paper represents the entire dataset in a compact and complete form without any information loss. Each transaction is encoded and stored as a node in the tree, in contrast to the existing algorithms that store each item as a node. The efficiency of BIN-Tree for datasets of varying size and dimensions were evaluated against SSP-Tree and WC-Tree. The results obtained revealed BIN-Tree to be 95% and 75% more space-efficient than SSP-Tree and WC-Tree respectively. The BIN-Tree construction and discovery of itemsets from a large dataset were found to be 93% and 22% more time-efficient than SSP-Tree and WC-Tree respectively. BIN-Tree is equally efficient to discover rare and frequent itemsets from a small dataset in the main memory.

Binary Count Tree: An Efficient and Compact Structure for Mining Rare and Frequent Itemsets