简体   繁体   中英

How to calculate the largest distance between two cumulative sample distributions in Python?

Assume there are two 1D Numpy array samples with the same length, X1 and X2. After converting each of the two samples separately into accumulative density distribution, how to calculate the largest distance between the two cumulative sample distributions? After the code below, what should I do?

import numpy as np
def function(X1, X2):
    x1 = np.sort(X1)
    y1 = np.arange(1, len(x1)+1) / float(len(x1))
    x2 = np.sort(X2)
    y2 = np.arange(1, len(x2)+1) / float(len(x2))

From your kolomogorov-smirnov tag I gather that the function you are looking for is from scipy, see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html .

One of it's input modes is two sample vectors. This makes it even easier than what you started to implement. Just use it directly as these examples:

from scipy.stats import kstest
import numpy as np
samps1 = np.random.normal(size=100)
samps2 = np.random.normal(size=100)
samps3 = np.random.normal(loc=1, size=100)
kstest(samps1, samps2)
>>> KstestResult(statistic=0.15, pvalue=0.21117008625127576)
kstest(samps2, samps1)
>>> KstestResult(statistic=0.15, pvalue=0.21117008625127576)
kstest(samps1, samps3)
>>> KstestResult(statistic=0.29, pvalue=0.0004117410017938115)
kstest(samps2, samps1).statistic
>>> 0.15

Note that the function returns both the statistic and the p_value, so you need to access.statistic directly after calling the function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM