简体   繁体   English

如何从两个二维分布中找到样本的 KL 散度?

[英]How do I find the KL Divergence of samples from two 2D distributions?

Suppose I had two 2D sets of 1000 samples that look something like this:假设我有两个 1000 个样本的 2D 集,看起来像这样:

在此处输入图像描述

I'd like to have a metric for the amount of difference between the distributions and thought the KL divergence would be suitable.我想对分布之间的差异量有一个度量标准,并认为 KL 散度是合适的。

I've been looking at sp.stats.entropy(), however from this answer:我一直在看 sp.stats.entropy(),但是从这个答案:

Interpreting scipy.stats.entropy values it appears I need to convert it to a pdf first. 解释 scipy.stats.entropy 值似乎我需要先将其转换为 pdf。 How can one do this using a 4 1D arrays?如何使用 4 1D arrays 做到这一点?

The example data above was generated as follows:上面的示例数据生成如下:

dist1_x = np.random.normal(0, 10, 1000)
dist1_y = np.random.normal(0, 5, 1000)

dist2_x = np.random.normal(3, 10, 1000)
dist2_y = np.random.normal(4, 5, 1000)

plt.scatter(dist1_x, dist1_y)
plt.scatter(dist2_x, dist2_y)
plt.show()

For my real data I only have the samples, not the distribution from which they came (although if need be one could calculate the mean and variance and assume Gaussian).对于我的真实数据,我只有样本,而不是它们的分布(尽管如果需要可以计算均值和方差并假设为高斯分布)。 Is it possible to calculate the KL divergence like this?可以这样计算KL散度吗?

There is a paper called "Kullback-Leibler Divergence Estimation of Continuous Distributions (2008)"有一篇论文叫做“Kullback-Leibler Divergence Estimation of Continuous Distributions (2008)”

And you might find the open source implementation here https://gist.github.com/atabakd/ed0f7581f8510c8587bc2f41a094b518您可能会在这里找到开源实现https://gist.github.com/atabakd/ed0f7581f8510c8587bc2f41a094b518

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM