Very simple test:
From above, (1) gives a low p-value while (2) gives a high p-value. Why?
Code for 1
from scipy.stats import kstest, poisson
noPts = 1000
lambdaPoisson = 10
my_data = poisson.rvs(size = noPts, mu = lambdaPoisson)
ks_statistic, p_value = kstest(my_data, 'poisson', args=(lambdaPoisson,0))
print(ks_statistic, p_value)
Results: 0.1239297144718523 7.61680985798287e-14
Code for 2
from scipy.stats import ks_2samp, poisson
noPts = 1000
lambdaPoisson = 10
my_data1 = poisson.rvs(size = noPts, mu = lambdaPoisson)
my_data2 = poisson.rvs(size = noPts*1000, mu = lambdaPoisson)
ks_statistic, p_value = ks_2samp(my_data1, my_data2)
print(ks_statistic, p_value)
Results: 0.023672000000000026 0.6301973762116004
The ks_statistic tests if two distributions are the same. However, using the same lambda (10) for generating distributions with different sizes generates different distributions, so the test is not significant.
If you want both distributions to be the same, the lambda of mydata2 needs to be multiplied by the same factor as the size was (in your case by 1000). I'm not a statistician so I don't fully understand why. But then the test is significant. If you want both distributions to remain centered on the same value, you have to play with the loc argument of the function.
Despite the differences, all of these distributions are Poisson, all will have p-val=0.0 if you do the test to them individually.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.