• # Data Mining

• Weko 3

Open source, java, tools for data pre-processing, classification, regression, clustering, association rules, visualization

• Data Mining with Weka

Principles of popular algorithms

• Data Mining Concepts and Techniques Non-fictional

• Konwledge Discovery in Databases Techniken und Anwendungen Non-fictional

• Data Science For Business, What You Need to Know about Data Mining Non-fictional

• Scikit-Learn Software libray

Classification, regression, clustering, dimensionality reduction, model selection, preprocessing, built on NumPy, SciPy, and matplotlib

• Linear Regression Wikipedia

a linear approach to modelling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables).

• Statistical classification Wikipedia

The problem of identifying to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations whose category membership is known.

• Bias-variance tradeoff Wikipedia

the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set

• K means clustering Wikipedia

k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster

• Cluster Analysis Wikipedia

task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)

• Principal component analysis Wikipedia

Dimensionality reduction, statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components

• Principal component analysis Wikipedia

condense the information of a large set of correlated variables into a few variables

• Decision tree Wikipedia

a flowchart-like structure in which each internal node represents a "test" on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label, the paths from root to leaf represent classification rules.

• Association rule learning Wikipedia

A rule-based machine learning method for discovering interesting relations between variables in large databases

• Data Mining with Weka Principles of popular algorithms

