# The Bias-Variance Decomposition

The page briefly summarizes the bias-variance decomposition of the prediction error. The decomposition illustrates the components of the prediction error and mainly shows the tradeoff between bias and variance, discussed below. This is discussed in greater detail in [The Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/) chapter 7.3.&#x20;

Given a dataset with features X and target variables Y, assume that there exists a function f such that

$$
Y = f(X) + \epsilon
$$

where 𝜖 is said to be the unexplainable part with

$$
E(\epsilon) =0, Var(\epsilon)=\sigma\_\epsilon^2
$$

further assuming that the target variables are real-valued and using the squared-error loss the error, given an observation x0 for a hypothetical function f-hat can be written as:

$$
Err(x\_0) = E\[(y\_0-\hat{f}(x\_0))^2|x\_0]\\=\sigma^2\_\epsilon + E\[\hat{f}(x\_0)-f(x\_0)]^2+E\[\hat{f}(x\_0)-E\[\hat{f}(x\_0)]]^2
$$

Note that by the definition of bias we get

$$
bias\_\theta=E\[\hat{\theta}-\theta] \implies
\\
E\[\hat{f}(x\_0)-f(x\_0)]^2
\= bias\_{\hat{f}}^2
$$

And by the definition of variance:

$$
Var\[\hat\theta] =E\[\hat\theta-E\[\hat\theta]]^2
$$

for any estimator of 𝜃, it is given that

$$
\sigma^2\_\epsilon + E\[\hat{f}(x\_0)-f(x\_0)]^2+E\[\hat{f}(x\_0)-E\[\hat{f}(x\_0)]]^2
\\
\=\sigma^2\_\epsilon + bias\_f^2 + variance\_f
$$

which means that the prediction error for an observation is the sum of the unexplainable variance, the squared bias and variance of the hypothetical function f-hat.

As seen in the equations above, the bias is the systematic error by an estimator. High bias models are models that make rigid assumptions and don't allow for a flexible fit to the data. Models with higher complexity allow for a more flexible fit to the data and as such have a lower bias. By fitting better to the data, the fit might vary from sample to sample thus increasing the variance of the fitted model. There is a trade-off between bias and variance, where the perfect model captures just enough complexity of the data to reduce bias without capturing the noise and increasing its variance.&#x20;

Note that increasing complexity lowers the training error of a model while increasing the variance and increases the prediction error. Therefore, great care must be taken when selecting and designing your model.

&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.backtick.se/machine-learning/model-evaluation-and-selection/the-bias-variance-decomposition.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
