Sunday, April 15, 2007

DMBI 2007 Keynote


Workshop on Data Mining and Business Intelligence
15 April 2007 - The Marmara Hotel, Istanbul, Turkey


Keynote: Jiawei Han - University of Illinois at Urbana-Champaign
Research Frontiers in Advanced Data Mining and Business Intelligence


The purpose of this key note address was to give an general overview of current research areas in Data Mining. Technical detail was not covered. Instead, a highlight of the challenges and direction of existing data mining related topics that Jiawei had personal exeriences with was discussed.

Pattern mining was the first topic discussed. Frequent pattern mining was the first approach with apriori, then FPGROWTH, Eclat and so on. Closed mining developed from this due to the large number of patterns that can be generated by frequent pattern approaches. FPClose, Charm and max pattern mining were mentioned. Also mentioned is correlation mining (PageRank from Google) and compression of patterns and more compact representation of large result sets.

Information network analysis was also discussed. Graph mining and mining data with links or cross-relational mining falls into this category. One recent approach that was developed by one of Jiawei students was based on the disabmiguation of different people with the same name. In DBLP, there are 14 distinct people all with the name Wei Wang. All of them are in computer science and a large propotion is in data mining. It is difficult to determing which Wei Wang is being referred to, just by looking at publication type. The solution proposed is to analyse co-author data. Based on this, fairly successeful classifications were made. 13 distinct users were identified (one was misclassified as someone else) and there were 2-3 paper misclassifications for the other 13 people.

Another area of data mining is stream data mining such as internet network traffic. The aim of stream data mining is prompt classification such as in network intrusion detection systems. One recent work is the prompt update of stream data cubes based on statistical analysis.

Mining moving object data is also interesting. While the focus is not yet on prompt classification, moving ships, cars and different objects can identify anomalies or predict future direction. For example, analysis of ship movements can potentially identify anomalies that could be terrorist vehicles or illegal shipping operations. Analysis of trajectory of hurricanes can predict movement and study of seasonal animal movement can be done before construction of major highways for the least amount of disturbance to wildlife.

Spatial, temporal and multimedia data mining need to consider obstacles before correct classification can be performed.

Text and Web mining considers links between pages and key words. It is important to extract the correct information. There is lots of information available. One way to extract important information is to identify key structure components. Graphics algorithms can be used to identify sections of useful information as indicators for the actual information.

Data mining system and software engineering deals with the clustering and classification of software bugs, where they are detected and where the location of the bug actually resides.

Finally, data cube oriented and multi dimensional online analytic process was discussed and separated into four areas: regression cubes, prediction cubes, integration cube and ranking query processing and high dimensional OLAP.

No comments: