REFORMULATING DATA MINING TASKS

 

 

 

 

 

 

 

 

 

 

 

These are the tasks that can be accomplished with the data mining tech­ niques described in this book (though no single data mining tool or technique is equally applicable to all tasks). The first three tasks, classification, estimation, and prediction are examples of directed data mining. Affinity grouping and clustering are examples of undi­ rected data mining. Profiling may be either directed or undirected. In directed data mining there is always a target variable—something to be classified, esti­ mated, or predicted. The process of building a classifier starts with a prede­ fined set of classes and examples of records that have already been correctly classified. Similarly, the process of building an estimator starts with historical data where the values of the target variable are already known.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The modeling task is to find rules that explain the known values of the target variable. In undirected data mining, there is no target variable. The data mining task is to find overall patterns that are not tied to any one variable. The most common form of undirected data mining is clustering, which finds groups of similar records without any instructions about which variables should be considered as most important. Undirected data mining is descriptive by nature, so undirected data mining techniques are often used for profiling, but directed techniques such as decision trees are also very useful for building profiles. In the machine learning literature, directed data mining is called supervised learning and undi­ rected data mining is called unsupervised learning. How Will the Results Be Used? This is one of the most important questions to ask when deciding how best to translate a business problem into a data mining problem. Surprisingly often, the initial answer is “we’re not sure.” An answer is important because, as the cautionary tale in the sidebar illustrates, different intended uses dictate differ­ ent solutions.