Tuesday, November 14, 2006

Frequent pattern mining code

I wish to apply some frequent pattern mining techniques to my current area of security research. While Weka offers are large number of packages, for the large number of possible values in each transaction, it does not perform particularly well (or at all). It is not scalable to my data.

Also, for now, I only wish to extract frequent patterns and evaluate their use. Instead of implementing my own frequent pattern algorithm, I looked on the Internet for existing implementations. My first find was http://fimi.cs.helsinki.fi/, a repository of implementation from papers published in 2003/2004. The all performed different variations on pattern mining.

To narrow my search for something that was more directly aligned with my data type/implementation, Rao suggested I look at Han et al.'s Frequent Pattern Tree and that I look at their existing implementations. This approach is known as a very efficient way to identify candidate frequent patterns so it is expect that many implementations of this approach exists. It makes sense to use the most efficient implementation to do what I want and it makes sense to use someone else’s implementation and make it do what I want. This led me to Frequent Itemset Mining Template Library. However, I had difficulty compiling on my system.

Finally, I found another implementation based on Han et al.’s approach. This one was in C++, and required data in the form of all items per transaction per line. This I could do.

So I translated the data that I have into the format that I the FPTree implementation required. Each set of roles found and its support is given, sorted based on support. Next is working out what to do with it. I will talk to Rao on Wednesday see if he as any ideas. Maybe look at identification of the best roles and trying to convert into a hierarchy of some sort is required.

1 comment:

saawdhan: It's your voice said...

sir i am doing my project in Sequential pattern mining. And finding it difficult to get the code for the latest any sequential frequent pattern algorithm . Sir plz help me for getting the source code for the above fore.