[英]Weighted Gaussian kernel density estimation in `python`
Update : Weighted samples are now supported by scipy.stats.gaussian_kde
.更新: scipy.stats.gaussian_kde
现在支持加权样本。 See here and here for details.有关详细信息,请参阅此处和此处。
It is currently not possible to use scipy.stats.gaussian_kde
to estimate the density of a random variable based on weighted samples .目前无法使用scipy.stats.gaussian_kde
来估计基于加权样本的随机变量的密度。 What methods are available to estimate densities of continuous random variables based on weighted samples?有哪些方法可以根据加权样本估计连续随机变量的密度?
Neither sklearn.neighbors.KernelDensity
nor statsmodels.nonparametric
seem to support weighted samples. sklearn.neighbors.KernelDensity
和statsmodels.nonparametric
似乎都不支持加权样本。 I modified scipy.stats.gaussian_kde
to allow for heterogeneous sampling weights and thought the results might be useful for others.我修改了scipy.stats.gaussian_kde
以允许异构采样权重,并认为结果可能对其他人有用。 An example is shown below.一个例子如下所示。
An ipython
notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5可以在此处找到ipython
笔记本: http : ipython
The weighted arithmetic mean is加权算术平均值为
The unbiased data covariance matrix is then given by 无偏数据协方差矩阵由下式给出
The bandwidth can be chosen by scott
or silverman
rules as in scipy
.可以像scipy
一样通过scott
或silverman
规则选择带宽。 However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size .但是,用于计算带宽的样本数是Kish 对有效样本大小的近似值。
For univariate distributions you can use KDEUnivariate
from statsmodels .对于单变量分布,您可以使用KDEUnivariate
的KDEUnivariate 。 It is not well documented, but the fit
methods accepts a weights
argument.它没有很好的文档记录,但fit
方法接受weights
参数。 Then you cannot use FFT.那么你不能使用FFT。 Here is an example:下面是一个例子:
import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate
kde1= KDEUnivariate(np.array([10.,10.,10.,5.]))
kde1.fit(bw=0.5)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')
kde1= KDEUnivariate(np.array([10.,5.]))
kde1.fit(weights=np.array([3.,1.]),
bw=0.5,
fft=False)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')
Check out the packages PyQT-Fit and statistics for Python.查看 Python 的 PyQT-Fit 和统计数据包。 They seem to have kernel density estimation with weighted observations.他们似乎有加权观察的核密度估计。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.