The phrase data mining is often referred to in conversation, sometimes made to sound like the silver bullet you have been looking for. But what is Data Mining?
Data Mining is the process of discovering patterns, relationships, and other previously unknown information from large sets of data. The majority of Data Mining techniques are founded on the disciplines of Statistical Analysis (Statistics) or Machine Learning (ML).
Data Mining Statistics
Statistics can be roughly categorised as either Descriptive and Inferential methods.
- Descriptive statistics aim at describing and summarising data in a meaningful way.
- Inferential statistics aim at cases where you don’t have access to the whole population you are investigating, but only to a limited sample instead.
Data Mining Machine Learning
Machine Learning is categorised as either Supervised learning or Unsupervised learning.
Supervised learning can be grouped into regression and classification problems.
Data Mining Tasks
- Classification and class probability estimation attempt to predict to which class an item belongs.
- Regression aims to estimate or predict, for an item, the numerical value of certain variables.
- Similarity matching attempts to identify items that are similar on the basis of some data known about them.
- Clustering attempts to group items on the basis of some common characteristics.
- Co-occurrence grouping attempts to find associations based on paired presence of transactions.
- Profiling attempts to identify the typical behaviour of an individual or a group of individuals.
- Link prediction attempts to predict connections between objects and individuals, commonly in social networking systems.
- Data reduction attempts to replace a large set of data with a smaller one containing much of the important data that are in the larger set but easier to analyse.
- Causal modeling attempts to identify events or actions that really influence other events or actions.
Data science for business: what you need to know about data mining and data-analytic thinking. O‘Reilly Media, Inc. Provost, F. and Fawcett, T., 2013