Kernel Density Estimation
Kernel Density Estimation, KDE
KDE is a useful non-parameteric estimation of a samples underlying distribution. Being non-parametric means that no assumptions about the samples distribution are made.
The KDE is generated by placing a kernel, e.g. a small Gaussian distribution, over each data point and then summing over all the kernels. Consider a sample below:
# Import numpy to generate a sample
import numpy as np
# Generate a sample of two gaussian distributions
X = np.concatenate((np.random.normal(0, 1, 80),
np.random.normal(8, 1, 20)))[:, np.newaxis]
# Lets visualize its histogram
_=plt.hist(X, density=True)

Using sklearn's KernelDensity a KDE can be estimated as:
# Import KDE from sklearn
from sklearn.neighbors import KernelDensity
# Fit a kernel density with gaussian kernel and bandwidth 0.5
kde = KernelDensity(kernel='gaussian', bandwidth=0.5)
kde.fit(X)
# Score samples for range
X_range = np.linspace(-5, 10, 1000)[:, np.newaxis]
estimated_dens = kde.score_samples(X_range)
# And plot it!
plt.plot(X_range[:,0], np.exp(estimated_dens))

Where kernel
can be changed for different distributions to sum up for each data point and bandwidth
for different width of the kernel.
Last updated
Was this helpful?