Fit and predict

Machine learning in python is elusively easy and convenient.

For any model one simply calls:

# Fit the model to the data
clf.fit(X,y)

# Use the model to predict y
clf.predict(X)

Fitting a DecisionTreeClassifier model to the famous Iris flower data set can be done as:

# Import built in iris data from scikit learn
from sklearn.datasets import load_iris
# Import a classifier, in this case a Decision Tree
from sklearn.tree import DecisionTreeClassifier

# Grab the iris data
X, y = load_iris(return_X_y=True)

# Import and create a classifier
clf = DecisionTreeClassifier()

# Simply fit the classifier to your data
clf.fit(X,y)

# Predict the class of the first examples
clf.predict(X[0:2,:])

This interface makes it amazingly easy to use these relatively advanced statistical models. However, it's never a good idea to just throw a model on your data.

Make sure that the model's assumptions are met for your data.

Last updated

Was this helpful?