Pipeline
Machine Learning consists of several stages such as encoding, scaling, feature selection and fitting a model. A pipeline is a convenient tool to ensure that all stages are applied correctly.
# Import standard scaler and logistic regression
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# Import iris data set and train/test splitter
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load iris data
X, y = load_iris(return_X_y=True)
# Split data into train/test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
# Create scaler and classifier
scaler = StandardScaler()
clf = LogisticRegression()
# First fit the scaler on X_train and then transform it
X_train_trans = scaler.fit_transform(X_train)
# Then fit the classifier
clf.fit(X_train_trans, y_train)
# Using the fitted scaler transform the X_test, DO NOT refit the scaler
X_test_trans = scaler.transform(X_test)
# Finally score the classifiers accuracy
clf.score(X_test_trans, y_test)Last updated
Was this helpful?