简体   繁体   中英

One-sided one sample T test by group on data frame?

I am trying to perform a one-sided, one sample T test by group on a pandas data frame in python. I feel like I am so close, but I just can't close the last bit. I was trying to follow something similar to these questions ( One Sided One Sample T Test Python and T-test for groups within a Pandas dataframe for a unique id ).

Say for example I have a data frame df

df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'],
                   'value': [0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275]})

and I wanted to generate a new data frame df_pval with just two columns: the 'ID' and p-value from a one-sided, one sample T test. I could do this in R like so:

library(dplyr)

df_pval <- df %>%
     group_by(ID) %>%
     summarise(res = list(t.test(value, mu = 0.220, alternative = 'greater')))

df_pval <- data.frame(ID = df_pval$ID,
     pval = sapply(df_pval$res, function(x) x[['p.value']]))

In fact, right now I use os to run an external R script to perform this action, but I know it must be possible just in python. I have tried creating a 'groupby' object and then running .apply :

df_groupID = df.groupby('ID').agg({'value': list})
df_groupID.apply(lambda x: stats.ttest_1samp(x['value'], 0.220))

but this doesn't work. As of now I'm stuck. Any help on this issue would be greatly appreciated. Thank you in advance and sorry if this has already been answered before (and I just didn't understand the solution).

Maybe try something like this (using scipy-1.7.3 ):

import pandas as pd
from scipy import stats
df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'],
                   'value': [0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275]})
df_groupID = df.groupby('ID').agg({'value': list})

df_groupID['value'] = df_groupID.value.apply(lambda x : stats.ttest_1samp(x, 0.220, alternative='greater').pvalue)
df_groupID = df_groupID.rename(index=str, columns={'value':'pval'})
        pval
ID          
A   0.999335
B   0.457619
C   0.010342

which is equivalent to:

library(dplyr)

df <- data.frame(
   ID = c ('A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'), 
   value = c(0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275)
)

df_pval <- df %>%
     group_by(ID) %>%
     summarise(res = list(t.test(value, mu = 0.220, alternative = 'greater')))
df_pval <- data.frame(ID = df_pval$ID,
     pval = sapply(df_pval$res, function(x) x[['p.value']]))
print(df_pval)
  ID       pval
1  A 0.99933491
2  B 0.45761885
3  C 0.01034185

For one-sided test, this should work for any version of scipy:

import pandas as pd
from scipy import stats
df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'],
                   'value': [0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275]})


df2 = df.groupby('ID').agg({'value': list})

df2['p_value'] = df2.value.apply(lambda x : stats.ttest_1samp(x,
                                            popmean=0.220).pvalue/2)

df2

Gives:

value   p_value
ID      
A   [0.2, 0.201, 0.189, 0.199, 0.205]   0.000665
B   [0.22, 0.225, 0.209, 0.218, 0.23]   0.457619
C   [0.308, 0.291, 0.34, 0.444, 0.275]  0.010342

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM