- Statistical classification
- Guide to data mining

- Bias-variance tradeoff
- Scikit-Learn

- Principal component analysis
- Data Science For Business, What You Need to Know about Data Mining

- Data Mining with Weka
- Weko 3

Open source, java, tools for data pre-processing, classification, regression, clustering, association rules, visualization

Principles of popular algorithms

Recommendation systems, classification, Naïve Bayes, clustering

Classification, regression, clustering, dimensionality reduction, model selection, preprocessing, built on NumPy, SciPy, and matplotlib

The problem of identifying to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations whose category membership is known.

The conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set

k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster

Phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings

Task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)

Dimensionality reduction, statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components, condense the information of a large set of correlated variables into a few variables

a flowchart-like structure in which each internal node represents a "test" on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label, the paths from root to leaf represent classification rules.

A rule-based machine learning method for discovering interesting relations between variables in large databases

