[英]Why scipy and Excel generate slightly different p-value for two-sample t-test?
[英]Why does t-test in Python (scipy, statsmodels) give results different from R, Stata, or Excel?
(问题已解决; x,y和s1,s2的大小不同)
在R中:
x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
t.test(x,y)
t = -1.6229, df = 29.727, p-value = 0.1152
在STATA和Excel中获得相同的数字
t.test(x,y,alternative="less")
t = -1.6229, df = 29.727, p-value = 0.05758
无论我尝试哪种选择,我都无法使用statsmodels.stats.weightstats.ttest_ind或scipy.stats.ttest_ind复制相同的结果。
statsmodels.stats.weightstats.ttest_ind(s1,s2,alternative="two-sided",usevar="unequal")
(-1.8912081781378358, 0.066740317997990656, 35.666557473974343)
scipy.stats.ttest_ind(s1,s2,equal_var=False)
(array(-1.8912081781378338), 0.066740317997990892)
scipy.stats.ttest_ind(s1,s2,equal_var=True)
(array(-1.8912081781378338), 0.066664507499812745)
必须有成千上万的人使用Python计算t检验。 我们都得到不正确的结果吗? (我通常依靠Python,但是这次我使用STATA检查了我的结果)。
简短的答案是,Python中提供的t检验与R和Stata中得到的结果相同 ,您在Python数组中只是有一个附加元素。
这就是我得到的结果,默认等于var:
>>> x_ = (373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
>>> y_ = (411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
>>> from scipy import stats
>>> stats.ttest_ind(x_, y_)
(array(-1.62292672368488), 0.11506840827144681)
>>> import statsmodels.api as sm
>>> sm.stats.ttest_ind(x_, y_)
(-1.6229267236848799, 0.11506840827144681, 30.0)
并使用不相等的var:
>>> statsmodels.stats.weightstats.ttest_ind(x_, y_,alternative="two-sided",usevar="unequal")
(-1.6229267236848799, 0.11516398707890187, 29.727196553288369)
>>> stats.ttest_ind(x_, y_, equal_var=False)
(array(-1.62292672368488), 0.11516398707890187)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.