Data Mining is a synthesis of statistics and informatics. Statistical methods are used to extract knowledge from large data sets (Big Data). There are two major kinds of problems.

Unsupervised learning tries to find structures in big data. This helps in many applications, like business intelligence (BI) and market segmentation or to detect outliers and risk factors.

Supervised learning is used in BI as well. Here the data scientists focuses on making good forecasts for a problem. This might be fraud detection or optical character recognition (OCR). Common methods that are used here are Decision Trees (CART, MARS), Artifical Neural Networks, Ensemble Models and many more.

Data Mining Tools

The origin of data mining lies partly in statistics. But many statistical programs are not well equipped to deal with really big data. The most real Data Mining/Business Intelligence-tools are independent programs providing the important methods for unsupervised and supervised learning. Programs like AllClear are especially well suited to visualize big data or business processes.

Of course there are exceptions. With increasing relevance of business intelligence the big players in statistical software - IBM SPSS Statistics, SAS, R - focused more and more on being capable of dealing with big data.

Business Intelligence

Business Intelligence covers the systematic collection, preparation and analysis of business data. The goal of BI is the optimization of business processes using the knowledge gained from the data analysis. This helps reducing costs, improving quality and lowering risks. Most useful tools for BI will seamlessly be integrated into the existing software environment.

Data Mining Software

Major feature of every data mining software is the capability of dealing with huge amounts of data ("Big Data").

