Monday, May 28, 2007

Closet: an efficient algorithm for mining frequent closed itemsets

@inproceedings{pei00closet,
author = {Jian Pei and Jiawei Han and Runying Mao},
title = {Closet: an efficient algorithm for mining frequent closed itemsets},
booktitle = {ACM SIGMOD Workshop on Research Issues in
Data Mining and Knowledge Discovery (DMKD 2000) },
year = {2000},
}

Mining frequent closed itemsets has the same power as mining the complete set of frequent itemsets. This reduces redundant rules to be generated and increases both efficiency and effectiveness of mining. If frequent pattern mining was used to generate roles, this effectively means generation of redundant roles are reduced.

The idea behind this approach is to use conditional databases in a divide and concur format. Given a list of all frequent items above min_sup, find conditional database of item in reverse order of support. Closed itemsets can be extracted iteratively from these conditional databases. Each conditional database may be further divided into more conditional databases.

The proposed approach extends the FP-growth tree approach for pattern discovery. Implementation requires FP tree and pruning for increased efficiency. Comparison with A-Close that uses Apriori to identify closed itemsets is basically the comparison between FP tree and Apriori.

The paper is not the easiest to read. It is filled with more theory and proofs than needed for a comprehensive understanding of the algorithm.

No comments: