Applied Data Mining for Business Intelligence

Niels Arnth-Jensen

AbstractBusiness Intelligence (BI) solutions have for many years been a hot topic among companies due to their optimization and decision making capabilities in business processes. The demand for yet more sophisticated and intelligent BI solutions is constantly growing due to the fact that storage capacity grows with twice the speed of processor power. This unbalanced growth relationship will over time make data processing tasks more time consuming when using traditional BI solutions.

Data Mining (DM) offers a variety of advanced data processing techniques that may beneficially be applied for BI purposes. This process is far from simple and often requires customization of the DM algorithm with respect to a given BI purpose. The comprehensive process of applying BI for a business problem is referred to as the Knowledge Discovery in Databases (KDD) process and is vital for successful DM implementations with BI in mind.

In this project the emphasis is on developing a number of advanced DM solutions with respect to desired data processing applications chosen in collaboration with the project partner, gatetrade.net. To gatetrade.net this project is meant as an eye opener to the world of advanced data processing and to all of its advantages. In the project, gatetrade.net is the primary data supplier. The data is mainly of a transactional character (order headers and lines) since gatetrade.net develops and maintains e-trade solutions.

Three different segmentation approaches (k-Nearest Neighbours (kNN), Fuzzy C-Means (FCM) and Unsupervised Fuzzy Partitioning - Optimal Number of Clusters (UFP-ONC)) have been implemented and evaluated in the pursuit of finding a good clustering algorithm with a high, consistent performance. In order to determine optimal numbers of segments in data sets, ten different cluster validity criteria have also been implemented and evaluated. To handle gatetrade.net data types a Data Formatting Framework has been developed.

Addressing the desired data processing applications is done using the capable UFP-ONC clustering algorithm (supported by the ten cluster validity criteria) along with a number of custom developed algorithms and methods. For future gatetrade.net interest a draft for a complete BI framework using some or all of the developed data processing algorithms is suggested.
KeywordsBusiness Intelligence, Data Mining, Knowledge Discovery in Databases, partition clustering algorithms, kNN, FCM, UFP-ONC, classification, cluster validity criteria.
TypeMaster's thesis [Industrial collaboration]
Year2006
PublisherInformatics and Mathematical Modelling, Technical University of Denmark, DTU
AddressRichard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby
SeriesIMM-Thesis-2006-98
NoteSupervised by Associate Professor Jan Larsen, IMM-DTU. Thesis co-supervisors are Christian Leth, Project Chief, gatetrade.net A/S and Allan Eskling-Hansen, Chief Financial Officer, gatetrade.net A/S.
Electronic version(s)[pdf]
BibTeX data [bibtex]
IMM Group(s)Intelligent Signal Processing