简体   繁体   English

`python`中的加权高斯核密度估计

[英]Weighted Gaussian kernel density estimation in `python`

Update : Weighted samples are now supported by scipy.stats.gaussian_kde .更新scipy.stats.gaussian_kde现在支持加权样本。 See here and here for details.有关详细信息,请参阅此处此处

It is currently not possible to use scipy.stats.gaussian_kde to estimate the density of a random variable based on weighted samples .目前无法使用scipy.stats.gaussian_kde来估计基于加权样本的随机变量的密度。 What methods are available to estimate densities of continuous random variables based on weighted samples?有哪些方法可以根据加权样本估计连续随机变量的密度?

Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. sklearn.neighbors.KernelDensitystatsmodels.nonparametric似乎都不支持加权样本。 I modified scipy.stats.gaussian_kde to allow for heterogeneous sampling weights and thought the results might be useful for others.我修改了scipy.stats.gaussian_kde以允许异构采样权重,并认为结果可能对其他人有用。 An example is shown below.一个例子如下所示。

例子

An ipython notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5可以在此处找到ipython笔记本: http : ipython

Implementation details实施细则

The weighted arithmetic mean is加权算术平均值为

加权算术平均值

The unbiased data covariance matrix is then given by 无偏数据协方差矩阵由下式给出无偏协方差矩阵

The bandwidth can be chosen by scott or silverman rules as in scipy .可以像scipy一样通过scottsilverman规则选择带宽。 However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size .但是,用于计算带宽的样本数是Kish 对有效样本大小的近似值

For univariate distributions you can use KDEUnivariate from statsmodels .对于单变量分布,您可以使用KDEUnivariateKDEUnivariate It is not well documented, but the fit methods accepts a weights argument.它没有很好的文档记录,但fit方法接受weights参数。 Then you cannot use FFT.那么你不能使用FFT。 Here is an example:下面是一个例子:

import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate

kde1= KDEUnivariate(np.array([10.,10.,10.,5.]))
kde1.fit(bw=0.5)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')

kde1= KDEUnivariate(np.array([10.,5.]))
kde1.fit(weights=np.array([3.,1.]), 
         bw=0.5,
         fft=False)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')

which produces this figure:产生这个数字: 在此处输入图片说明

Check out the packages PyQT-Fit and statistics for Python.查看 Python 的 PyQT-Fit 和统计数据包。 They seem to have kernel density estimation with weighted observations.他们似乎有加权观察的核密度估计。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM