简体   繁体   中英

Equivalent t-test results in PANDAS?

I am trying to learn a bit of PANDAS and so I'm going through some R code and trying to reproduce things in Python.

I have the following simple example

tempdat <- data.frame(unit=c('feet','feet','feet','feet','metres','metres','metres','metres'),
                  feet=c(50,45,75,60,26,32,40,45))
t.test(feet~unit, alternative='two.sided', conf.level=.95, var.equal=F, data=tempdat)

I want to do the equivalent function in Python, and this is what I have so far, but the results are different.

tempdat = pd.DataFrame({'unit':['feet','feet','feet','feet','metres','metres','metres','metres'], 'feet':[50,45,75,60,26,32,40,45]})
feet_group = tempdat[tempdat['unit']=='feet']
metres_group = tempdat[tempdat['unit']=='metres']
stats.ttest_ind(feet_group['feet'], metres_group['feet'], equal_var=False)

On the face an error in the first line: tempdat is python built-in dict. So it must have unique keys. So after definition

tempdat={'feet':50,'feet':45,'feet':75,'feet':60,'metres':26,'metres':32,'metres':40,'metres':45}

you will have only last values:

tempdat={'feet': 60, 'metres': 45}

Therefore the test results differ

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM