# Kullback–Leibler Divergence

[Count Bayesie](https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained) gives an excellent explanation and example of Kullback-Leibler Divergence for the interested reader. Count Bayesie explains Kullback-Leibler Divergence, *KL divergence,*  as a measure of how much information is lost when using an approximation. KL Divergence is therefore every important for machine learning as all models fitted are approximations of the true underlying relationship between the features and target variables.

### Entropy

Before explaining KL divergence, a short recap of entropy might be needed. Remember,  entropy is a measure of how much information is gained in observing the outcome of a random variable. The entropy for a random variable X is defined as:

$$
H(X) =-\sum\_iP(x\_i)log P(x\_i)
$$

### Kullback-Leibler Divergence

Kullback-Leibler divergence is a modification of the entropy formula

$$
KL(P||Q)=\sum\_{x\in\chi}P(x)log\frac{P(X)}{Q(X)}
$$

Note, assuming P is the reference distribution that this is the expected log-difference between the reference and approximation:

$$
KL(P||Q)=E\_p\[log P(x)-logQ(x)]
$$

which by the linearity of expectation can be written as

$$
KL(P||Q)=E\_p\[log P(x)]-E\_p\[logQ(x)]
$$

If P, i.e. the reference or *true* distribution, is known the KL divergence is useful to select the best representative approximation Q. Sadly, in machine learning the real distribution is rarely known.

The KL divergence can be rewritten to compare two different approximations directly:

$$
KL(P||Q\_0)-KL(P||Q\_1)=E\_p\[log Q\_0(x)]-E\_p\[logQ\_1(x)]
$$

Removing the expectation with the original distributions term. However, the expected value is still with respect to the original distribution P. In the next section AIC is introduced as an approximation that can be used without knowing P.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.backtick.se/machine-learning/model-evaluation-and-selection/kullback-leibler-divergence.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
