简体   繁体   English

R和Python之间的不同t检验pvalues

[英]Different t-test pvalues between R and Python

I'm currently a python newb and am trying to learn more about propensity score matching. 我目前是一个python newb,我正在尝试更多地了解倾向得分匹配。 I found a great tutorial from Stanford.edu( since this is my first post stack overflow won't let me post two links but google Stanford propensity score matching) that covers this. 我从Stanford.edu找到了一个很棒的教程(因为这是我的第一个帖子堆栈溢出不会让我发布两个链接,但谷歌斯坦福倾向得分匹配),这涵盖了这一点。 My goal was to recreate this all in python and understand what's happening. 我的目标是在python中重新创建这一切,并了解正在发生的事情。

My issue is when I get to section 1.2 Difference-in-means: pre-treatment covariates and start running t-test. 我的问题是当我得到第1.2节中的差异:治疗前协变量并开始运行t检验。 I don't understand why the p-values are so different between R and Python for the same test and same data. 我不明白为什么对于相同的测试和相同的数据,R和Python之间的p值是如此不同。

R code: with(ecls, t.test(race_white ~ catholic, var.equal=FALSE)) R代码: with(ecls, t.test(race_white ~ catholic, var.equal=FALSE))

R output: R输出:

 Welch Two Sample t-test data: race_white by catholic t = -13.453, df = 2143.3, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1936817 -0.1444003 sample estimates: mean in group 0 mean in group 1 0.5561246 0.7251656 

When I perform the same think in python my t-stat and degrees of freedom are identical but my p-values are way off. 当我在python中执行相同的思考时,我的t-stat和自由度是相同的,但我的p值是偏离的。

Python code: Python代码:

cath=dat[dat['catholic']==1]['race_white']
noncath=dat[dat['catholic']==0]['race_white']
fina =sms.ttest_ind(noncath,cath,alternative='two-sided', usevar='unequal')
print(fina)
print("The t-statistic is %.3f the p-value is %.3f and the df is %.3f"%fina) 

Python output: (-13.45342570302274, 1.1413329198468439e-39, 2143.2902027156415) The t-statistic is -13.453 the p-value is 0.000 and the df is 2143.290' Python输出:( - (-13.45342570302274, 1.1413329198468439e-39, 2143.2902027156415) The t-statistic is -13.453 the p-value is 0.000 and the df is 2143.290'

I'm using the exact same dataset just can't figure out why the two are different. 我使用完全相同的数据集只是无法弄清楚为什么两者是不同的。 I saw in another SO topic that was similar but their conclusion was the sizes were different. 我在另一个SO主题中看到了相似但他们的结论是尺寸不同。 This is using the same data set so size isn't different. 这使用相同的数据集,因此大小不同。

The data file can be found here for data file(ecls.csv) that is used for both python and R. Any help as to why the p-values are different for this t-test is greatly appreciated. 可以在这里找到用于python和R的数据文件(ecls.csv)的数据文件。非常感谢有关为什么p值与此t检验不同的任何帮助。

R does not print p-values below 2.2e-16, but they are calculated and stored. R不会打印低于2.2e-16的p值,但会计算和存储它们。 Try this for your R code: 试试这个R代码:

with(ecls, t.test(race_white ~ catholic, var.equal=FALSE))$p.value
[1] 1.141333e-39

The value is effectively zero, which is why when you print it to 3 decimal places using Python, you see 0.000. 该值实际上为零,这就是为什么当您使用Python将其打印到3位小数时,您会看到0.000。 Try printing the unmodified p-value in Python (don't use %.3f - in fact you did already! print(fina) ) and I would hope you would see about the same value as for R (in fact you do!) 尝试在Python中打印未修改的p值(不要使用%.3f - 事实上你已经做过了! print(fina) )我希望你会看到与R相同的值(实际上你做的!)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM