简体   繁体   中英

How do I find the KL Divergence of samples from two 2D distributions?

Suppose I had two 2D sets of 1000 samples that look something like this:

在此处输入图像描述

I'd like to have a metric for the amount of difference between the distributions and thought the KL divergence would be suitable.

I've been looking at sp.stats.entropy(), however from this answer:

Interpreting scipy.stats.entropy values it appears I need to convert it to a pdf first. How can one do this using a 4 1D arrays?

The example data above was generated as follows:

dist1_x = np.random.normal(0, 10, 1000)
dist1_y = np.random.normal(0, 5, 1000)

dist2_x = np.random.normal(3, 10, 1000)
dist2_y = np.random.normal(4, 5, 1000)

plt.scatter(dist1_x, dist1_y)
plt.scatter(dist2_x, dist2_y)
plt.show()

For my real data I only have the samples, not the distribution from which they came (although if need be one could calculate the mean and variance and assume Gaussian). Is it possible to calculate the KL divergence like this?

There is a paper called "Kullback-Leibler Divergence Estimation of Continuous Distributions (2008)"

And you might find the open source implementation here https://gist.github.com/atabakd/ed0f7581f8510c8587bc2f41a094b518

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM