如何从两个二维分布中找到样本的 KL 散度？

Question

Suppose I had two 2D sets of 1000 samples that look something like this:假设我有两个 1000 个样本的 2D 集，看起来像这样：

I'd like to have a metric for the amount of difference between the distributions and thought the KL divergence would be suitable.我想对分布之间的差异量有一个度量标准，并认为 KL 散度是合适的。

I've been looking at sp.stats.entropy(), however from this answer:我一直在看 sp.stats.entropy()，但是从这个答案：

Interpreting scipy.stats.entropy values it appears I need to convert it to a pdf first. 解释 scipy.stats.entropy 值似乎我需要先将其转换为 pdf。 How can one do this using a 4 1D arrays?如何使用 4 1D arrays 做到这一点？

The example data above was generated as follows:上面的示例数据生成如下：

dist1_x = np.random.normal(0, 10, 1000)
dist1_y = np.random.normal(0, 5, 1000)

dist2_x = np.random.normal(3, 10, 1000)
dist2_y = np.random.normal(4, 5, 1000)

plt.scatter(dist1_x, dist1_y)
plt.scatter(dist2_x, dist2_y)
plt.show()

For my real data I only have the samples, not the distribution from which they came (although if need be one could calculate the mean and variance and assume Gaussian).对于我的真实数据，我只有样本，而不是它们的分布（尽管如果需要可以计算均值和方差并假设为高斯分布）。 Is it possible to calculate the KL divergence like this?可以这样计算KL散度吗？

Answer 1

There is a paper called "Kullback-Leibler Divergence Estimation of Continuous Distributions (2008)"有一篇论文叫做“Kullback-Leibler Divergence Estimation of Continuous Distributions (2008)”

And you might find the open source implementation here https://gist.github.com/atabakd/ed0f7581f8510c8587bc2f41a094b518您可能会在这里找到开源实现https://gist.github.com/atabakd/ed0f7581f8510c8587bc2f41a094b518

如何从两个二维分布中找到样本的 KL 散度？

问题描述

1 个解决方案

解决方案1
0 2021-02-14 14:43:45

如何从两个二维分布中找到样本的 KL 散度？

问题描述

1 个解决方案

解决方案1 0 2021-02-14 14:43:45

解决方案1
0 2021-02-14 14:43:45