[英]How can you perform one-tailed two-sample Kolmogorov–Smirnov Test in Python?
I'm trying to perform a two-sample KS Test in Python 3 to detect any significant difference between distributions.我正在尝试在 Python 3 中执行两个样本 KS 测试,以检测分布之间的任何显着差异。 For the sake of convenience, letting a and b a data column of.csv I'd like to compare, I simply ran the following "code":
为了方便起见,让a和b的数据列为.csv 我想比较一下,我简单地跑了下面的“代码”:
from scipy.stats import ks_2samp
ks_2samp(a, b)
The returning values contained the greatest distance ( statistics
) and the p-value ( pvalue
):返回值包含最大距离 (
statistics
) 和 p 值 ( pvalue
):
Ks_2sampResult(statistic=0.0329418537762845, pvalue=0.000127997328482532)
What I would like to know is, since ks_2samp
only treats the two-sided two-sample KS Test, is there a way to perform a one-sided two-sample KS Test in Python?我想知道的是,由于
ks_2samp
只处理双面双样本 KS 测试,有没有办法在 Python 中执行单面双样本 KS 测试?
In addition, how can I find out the position of where the greatest distance occurs?另外,如何找出距离最大的position? (The x-axis value).
(x 轴值)。
scipy.stats.ks_2samp
already supports what you want. scipy.stats.ks_2samp
已经支持你想要的了。 You just need to tell the direction in which you want to test, ie which sample is assumed greater or smaller than the other.您只需要告诉您要测试的方向,即假设哪个样本大于或小于另一个。
This option for setting alternative
is however only available since scipy 1.3.0.但是,此选项设置
alternative
仅在 scipy 1.3.0 之后可用。
ks_2samp(a, b, alternative='less') # get p-value for testing if a < b
ks_2samp(a, b, alternative='greater') # get p-value for testing if a > b
Edit: To identify the x-value where the largest difference occurred, you can use this function (mainly copy-paste from the source of ks_2samp
):编辑:要识别出现最大差异的 x 值,您可以使用此 function (主要从
ks_2samp
的源复制粘贴):
def ks_2samp_x(data1, data2, alternative="two-sided"):
data1 = np.sort(data1)
data2 = np.sort(data2)
n1 = data1.shape[0]
n2 = data2.shape[0]
data_all = np.concatenate([data1, data2])
# using searchsorted solves equal data problem
cdf1 = np.searchsorted(data1, data_all, side='right') / n1
cdf2 = np.searchsorted(data2, data_all, side='right') / n2
cddiffs = cdf1 - cdf2
minS = np.argmin(cddiffs) # ks_2samp uses np.min or np.max respectively
maxS = np.argmax(cddiffs) # now we get instead the index in data_all
alt2Dvalue = {'less': minS, 'greater': maxS, 'two-sided': max(minS, maxS)}
d_arg = alt2Dvalue[alternative]
return data_all[d_arg]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.