I need to find a way to calculate a p-value for two sets of data, comparing each row in one DataFrame with the accompanying row in another DataFrame. For example, array1 would be the five items in row 300 (not including stdev and Ctrl average), and same for array2 with the five items in row 300.
df1:
Pep Ctrl 1 Pep Ctrl 2 Pep Ctrl 3 Pep Ctrl 4 Pep Ctrl 5 stdev Ctrl average
300 47591000.0 NaN 49576000.0 41288000.0 61727000.0 8.551730e+06 4.174675e+07
301 4305900.0 2670800.0 NaN NaN 7338400.0 2.368407e+06 4.170877e+06
302 11466000.0 3799400.0 NaN 18552000.0 31661000.0 1.184124e+07 1.546393e+07
303 11255000.0 5402300.0 18337000.0 19706000.0 40286000.0 1.321849e+07 1.803413e+07
df2:
MCI 1 vs Ctrl normalized MCI 2 vs Ctrl normalized MCI 3 vs Ctrl normalized MCI 4 vs Ctrl normalized MCI 5 vs Ctrl normalized stdev
300 1.054045e+08 4.980206e+07 4.764870e+07 1.834201e+07 2.994124e+07 3.346473e+07
301 1.019931e+07 3.309509e+06 6.595145e+06 1.089385e+07 NaN 3.508776e+06
302 3.288333e+07 6.953062e+06 1.430190e+07 4.988915e+06 2.310888e+07 1.162495e+07
303 3.332308e+07 1.682790e+07 2.951138e+07 9.474570e+06 2.965893e+07 1.014219e+07
I need to do a two-tailed t test with equal variances, and then add this as the last column. Alternatively, if SciPy has an option to just input the number of items, standard deviation, and average, this could also work.
This is what I tried:
group1 = [df1['Pep Ctrl 1'],df1['Pep Ctrl 2'],df1['Pep Ctrl 3'],df1['Pep Ctrl 4'],df1['Pep Ctrl 5']]
group2 = [df2['MCI 1 vs Ctrl normalized'], df2['MCI 2 vs Ctrl normalized'], df2['MCI 3 vs Ctrl normalized'], df2['MCI 4 vs Ctrl normalized'], df2['MCI 5 vs Ctrl normalized']]
ttest = stats.ttest_ind(a=group1,b=group2,axis = 1, equal_var = True)
Any help would be appreciated.
df1
constructor:
{'Pep Ctrl 1': [47591000.0, 4305900.0, 11466000.0, 11255000.0],
'Pep Ctrl 2': [nan, 2670800.0, 3799400.0, 5402300.0],
'Pep Ctrl 3': [49576000.0, nan, nan, 18337000.0],
'Pep Ctrl 4': [41288000.0, nan, 18552000.0, 19706000.0],
'Pep Ctrl 5': [61727000.0, 7338400.0, 31661000.0, 40286000.0],
'stdev': [8551730.0, 2368407.0, 11841240.0, 13218490.0],
'Ctrl average': [41746750.0, 4170877.0, 15463930.0, 18034130.0]}
df2
constructor:
{'MCI 1 vs Ctrl normalized': [105404500.0, 10199310.0, 32883330.0, 33323080.0],
'MCI 2 vs Ctrl normalized': [49802060.0, 3309509.0, 6953062.0, 16827900.0],
'MCI 3 vs Ctrl normalized': [47648700.0, 6595145.0, 14301900.0, 29511380.0],
'MCI 4 vs Ctrl normalized': [18342010.0, 10893850.0, 4988915.0, 9474570.0],
'MCI 5 vs Ctrl normalized': [29941240.0, nan, 23108880.0, 29658930.0],
'stdev': [33464730.0, 3508776.0, 11624950.0, 10142190.0]}
You could use iterrows
to iterate over df1
and compare each row with a corresponding row in df2
with the same index:
from scipy import stats
df2_cols = df2.columns.drop('stdev')
out = [stats.ttest_ind(df2.loc[i, df2_cols], row, equal_var=True, nan_policy='omit')
for i, row in df1.drop(columns=['stdev','Ctrl average']).iterrows()]
Output:
[Ttest_indResult(statistic=0.010483243999151896, pvalue=0.9919282503324176),
Ttest_indResult(statistic=1.2563264347346306, pvalue=0.26449954642964396),
Ttest_indResult(statistic=0.009874028613226149, pvalue=0.9923973079846519),
Ttest_indResult(statistic=0.6390907092148139, pvalue=0.5406265164807074)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.