简体   繁体   中英

How can you perform one-tailed two-sample Kolmogorov–Smirnov Test in Python?

I'm trying to perform a two-sample KS Test in Python 3 to detect any significant difference between distributions. For the sake of convenience, letting a and b a data column of.csv I'd like to compare, I simply ran the following "code":

from scipy.stats import ks_2samp
ks_2samp(a, b)

The returning values contained the greatest distance ( statistics ) and the p-value ( pvalue ):

Ks_2sampResult(statistic=0.0329418537762845, pvalue=0.000127997328482532)

What I would like to know is, since ks_2samp only treats the two-sided two-sample KS Test, is there a way to perform a one-sided two-sample KS Test in Python?

In addition, how can I find out the position of where the greatest distance occurs? (The x-axis value).

scipy.stats.ks_2samp already supports what you want. You just need to tell the direction in which you want to test, ie which sample is assumed greater or smaller than the other.

This option for setting alternative is however only available since scipy 1.3.0.

ks_2samp(a, b, alternative='less')     # get p-value for testing if a < b
ks_2samp(a, b, alternative='greater')  # get p-value for testing if a > b

Edit: To identify the x-value where the largest difference occurred, you can use this function (mainly copy-paste from the source of ks_2samp ):

def ks_2samp_x(data1, data2, alternative="two-sided"):
    data1 = np.sort(data1)
    data2 = np.sort(data2)
    n1 = data1.shape[0]
    n2 = data2.shape[0]

    data_all = np.concatenate([data1, data2])
    # using searchsorted solves equal data problem
    cdf1 = np.searchsorted(data1, data_all, side='right') / n1
    cdf2 = np.searchsorted(data2, data_all, side='right') / n2
    cddiffs = cdf1 - cdf2
    minS = np.argmin(cddiffs)   # ks_2samp uses np.min or np.max respectively 
    maxS = np.argmax(cddiffs)   # now we get instead the index in data_all
    alt2Dvalue = {'less': minS, 'greater': maxS, 'two-sided': max(minS, maxS)}
    d_arg = alt2Dvalue[alternative]
    return data_all[d_arg]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM