One-sided one sample T test by group on data frame?

Question

I am trying to perform a one-sided, one sample T test by group on a pandas data frame in python. I feel like I am so close, but I just can't close the last bit. I was trying to follow something similar to these questions ( One Sided One Sample T Test Python and T-test for groups within a Pandas dataframe for a unique id ).

Say for example I have a data frame df

df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'],
                   'value': [0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275]})

and I wanted to generate a new data frame df_pval with just two columns: the 'ID' and p-value from a one-sided, one sample T test. I could do this in R like so:

library(dplyr)

df_pval <- df %>%
     group_by(ID) %>%
     summarise(res = list(t.test(value, mu = 0.220, alternative = 'greater')))

df_pval <- data.frame(ID = df_pval$ID,
     pval = sapply(df_pval$res, function(x) x[['p.value']]))

In fact, right now I use os to run an external R script to perform this action, but I know it must be possible just in python. I have tried creating a 'groupby' object and then running .apply :

df_groupID = df.groupby('ID').agg({'value': list})
df_groupID.apply(lambda x: stats.ttest_1samp(x['value'], 0.220))

but this doesn't work. As of now I'm stuck. Any help on this issue would be greatly appreciated. Thank you in advance and sorry if this has already been answered before (and I just didn't understand the solution).

Answer 1

Maybe try something like this (using scipy-1.7.3 ):

import pandas as pd
from scipy import stats
df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'],
                   'value': [0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275]})
df_groupID = df.groupby('ID').agg({'value': list})

df_groupID['value'] = df_groupID.value.apply(lambda x : stats.ttest_1samp(x, 0.220, alternative='greater').pvalue)
df_groupID = df_groupID.rename(index=str, columns={'value':'pval'})

        pval
ID          
A   0.999335
B   0.457619
C   0.010342

which is equivalent to:

library(dplyr)

df <- data.frame(
   ID = c ('A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'), 
   value = c(0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275)
)

df_pval <- df %>%
     group_by(ID) %>%
     summarise(res = list(t.test(value, mu = 0.220, alternative = 'greater')))
df_pval <- data.frame(ID = df_pval$ID,
     pval = sapply(df_pval$res, function(x) x[['p.value']]))
print(df_pval)

  ID       pval
1  A 0.99933491
2  B 0.45761885
3  C 0.01034185

Answer 2

For one-sided test, this should work for any version of scipy:

import pandas as pd
from scipy import stats
df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'A', 
                          'B', 'B', 'B', 'B', 'B', 
                          'C', 'C', 'C', 'C', 'C'],
                   'value': [0.200, 0.201, 0.189, 0.199, 0.205, 
                             0.220, 0.225, 0.209, 0.218, 0.230, 
                             0.308, 0.291, 0.340, 0.444, 0.275]})


df2 = df.groupby('ID').agg({'value': list})

df2['p_value'] = df2.value.apply(lambda x : stats.ttest_1samp(x,
                                            popmean=0.220).pvalue/2)

df2

Gives:

value   p_value
ID      
A   [0.2, 0.201, 0.189, 0.199, 0.205]   0.000665
B   [0.22, 0.225, 0.209, 0.218, 0.23]   0.457619
C   [0.308, 0.291, 0.34, 0.444, 0.275]  0.010342

One-sided one sample T test by group on data frame?

Question

2 answers

solution1
1 2022-02-01 17:08:50

solution2
0 2022-02-01 17:41:38

One-sided one sample T test by group on data frame?

Question

2 answers

solution1 1 2022-02-01 17:08:50

solution2 0 2022-02-01 17:41:38

solution1
1 2022-02-01 17:08:50

solution2
0 2022-02-01 17:41:38