The Bias-Variance Decomposition

Theoretical decomposition of the prediction error shows the bias-variance dynamics of prediction errors.

The page briefly summarizes the bias-variance decomposition of the prediction error. The decomposition illustrates the components of the prediction error and mainly shows the tradeoff between bias and variance, discussed below. This is discussed in greater detail in The Elements of Statistical Learning chapter 7.3.

Given a dataset with features X and target variables Y, assume that there exists a function f such that

Y = f(X) + \epsilon

where 𝜖 is said to be the unexplainable part with

E(\epsilon) =0, Var(\epsilon)=\sigma_\epsilon^2

further assuming that the target variables are real-valued and using the squared-error loss the error, given an observation x0 for a hypothetical function f-hat can be written as:

Err(x_0) = E[(y_0-\hat{f}(x_0))^2|x_0]\\=\sigma^2_\epsilon + E[\hat{f}(x_0)-f(x_0)]^2+E[\hat{f}(x_0)-E[\hat{f}(x_0)]]^2

Note that by the definition of bias we get

bias_\theta=E[\hat{\theta}-\theta] \implies \\ E[\hat{f}(x_0)-f(x_0)]^2 = bias_{\hat{f}}^2

And by the definition of variance:

Var[\hat\theta] =E[\hat\theta-E[\hat\theta]]^2

for any estimator of 𝜃, it is given that

\sigma^2_\epsilon + E[\hat{f}(x_0)-f(x_0)]^2+E[\hat{f}(x_0)-E[\hat{f}(x_0)]]^2 \\ =\sigma^2_\epsilon + bias_f^2 + variance_f

which means that the prediction error for an observation is the sum of the unexplainable variance, the squared bias and variance of the hypothetical function f-hat.

As seen in the equations above, the bias is the systematic error by an estimator. High bias models are models that make rigid assumptions and don't allow for a flexible fit to the data. Models with higher complexity allow for a more flexible fit to the data and as such have a lower bias. By fitting better to the data, the fit might vary from sample to sample thus increasing the variance of the fitted model. There is a trade-off between bias and variance, where the perfect model captures just enough complexity of the data to reduce bias without capturing the noise and increasing its variance.

Note that increasing complexity lowers the training error of a model while increasing the variance and increases the prediction error. Therefore, great care must be taken when selecting and designing your model.

PreviousModel Evaluation & Selection NextKullback–Leibler Divergence

Last updated 5 years ago

Was this helpful?