Kernel Density Estimation
Kernel Density Estimation, KDE
KDE is a useful non-parameteric estimation of a samples underlying distribution. Being non-parametric means that no assumptions about the samples distribution are made.
The KDE is generated by placing a kernel, e.g. a small Gaussian distribution, over each data point and then summing over all the kernels. Consider a sample below:
# Import numpy to generate a sample
import numpy as np
# Generate a sample of two gaussian distributions
X = np.concatenate((np.random.normal(0, 1, 80),
np.random.normal(8, 1, 20)))[:, np.newaxis]
# Lets visualize its histogram
_=plt.hist(X, density=True)
Using sklearn's KernelDensity a KDE can be estimated as:

Where kernel can be changed for different distributions to sum up for each data point and bandwidth for different width of the kernel.
Last updated
Was this helpful?