How to compare 2D distributions?

Question

I need to compare 2D distributions with KL Divergence. I tried using scipy.stats.entropy but that's returning inf .

How do I set up scipy.stats.entropy to work with 2 axes and return a value?

I tried:

from scipy.stats import entropy
import pandas as pd

one = pd.read_csv(file_one)
two = pd.read_csv(file_two)
pk = [list(item) for item in zip(one["X"], one["Y"])]
qk = [list(item) for item in zip(two["X"], two["Y"])]
for l in [pk, qk]:
    for i in range(len(l)):
        for j in range(len(l[i])):
            # to confirm that no values are 0 
            #(will change to a smaller value once inf is not being returned)
            if abs(l[i][j]) < 0.1:
                l[i][j] = 0.1
print(entropy(pk, qk))

That prints: [inf inf]

What I really want is a single value, but to start I need it to stop returning ing

Answer 1

Look at the equation for KL Divergence:

S = sum(pk * log(pk / qk), axis=0)

If you have zero values in qk this will result in your infinities. Since KL is a probability density (even a discrete one) probabilities of an event in a distribution are never zero, so you should replace your zeros with very small values. As for your shape issue, you could flatten the input or take a histogram.

Edit: You can't have negative values either, what is a negative probability? KL divergence compares distributions of probabilities it isn't defined otherwise.

How to compare 2D distributions?

Question

1 answers

solution1
1 ACCPTED 2019-08-02 17:13:01

How to compare 2D distributions?

Question

1 answers

solution1 1 ACCPTED 2019-08-02 17:13:01

solution1
1 ACCPTED 2019-08-02 17:13:01