Backtick Knowledge Base
  • Backtick Knowledge Base
  • 📊Statistics
    • Kernel Density Estimation
    • Tests
  • 🍂Machine Learning
    • Fit and predict
    • Encoding
    • Feature Scaling
    • Pipeline
    • Model Evaluation & Selection
      • The Bias-Variance Decomposition
      • Kullback–Leibler Divergence
      • AIC & BIC
      • Cross-Validation
    • Feature Selection
    • Dimensionality Reduction
    • Clustering
    • Pandas
  • 🧠Deep Learning
  • 🐍Python
    • Beautiful Data
    • S3
      • List bucket items
      • Delete bucket items
      • Get objects
      • Upload objects
      • Get files
      • Upload files
      • Read .csv-file to dataframe
      • Write dataframe to .csv-file
  • ☁️Cloud
    • GCP
    • AWS
      • Users & Policies
        • Basic setup
        • MFA
      • EKS
        • Setup
        • Kube Config
        • Dashboard
      • S3
        • Copying buckets
  • ❖ Distributed Computing
    • Map-Reduce
    • Spark
    • Dask
  • ⎈ Kubernetes
Powered by GitBook
On this page

Was this helpful?

  1. Machine Learning

Fit and predict

Machine learning in python is elusively easy and convenient.

PreviousTestsNextEncoding

Last updated 4 years ago

Was this helpful?

For any model one simply calls:

# Fit the model to the data
clf.fit(X,y)

# Use the model to predict y
clf.predict(X)

Fitting a model to the famous can be done as:

# Import built in iris data from scikit learn
from sklearn.datasets import load_iris
# Import a classifier, in this case a Decision Tree
from sklearn.tree import DecisionTreeClassifier

# Grab the iris data
X, y = load_iris(return_X_y=True)

# Import and create a classifier
clf = DecisionTreeClassifier()

# Simply fit the classifier to your data
clf.fit(X,y)

# Predict the class of the first examples
clf.predict(X[0:2,:])

This interface makes it amazingly easy to use these relatively advanced statistical models. However, it's never a good idea to just throw a model on your data.

Make sure that the model's assumptions are met for your data.

🍂
DecisionTreeClassifier
Iris flower data set