[英]Pandas t-test using row as the arrays
我需要找到一种方法来计算两组数据的 p 值,将一个 DataFrame 中的每一行与另一个 DataFrame 中的相应行进行比较。例如,array1 将是第 300 行中的五个项目(不包括 stdev 和 Ctrl平均值),对于第 300 行中的五个项目的 array2 也是如此。
df1:
Pep Ctrl 1 Pep Ctrl 2 Pep Ctrl 3 Pep Ctrl 4 Pep Ctrl 5 stdev Ctrl average
300 47591000.0 NaN 49576000.0 41288000.0 61727000.0 8.551730e+06 4.174675e+07
301 4305900.0 2670800.0 NaN NaN 7338400.0 2.368407e+06 4.170877e+06
302 11466000.0 3799400.0 NaN 18552000.0 31661000.0 1.184124e+07 1.546393e+07
303 11255000.0 5402300.0 18337000.0 19706000.0 40286000.0 1.321849e+07 1.803413e+07
df2:
MCI 1 vs Ctrl normalized MCI 2 vs Ctrl normalized MCI 3 vs Ctrl normalized MCI 4 vs Ctrl normalized MCI 5 vs Ctrl normalized stdev
300 1.054045e+08 4.980206e+07 4.764870e+07 1.834201e+07 2.994124e+07 3.346473e+07
301 1.019931e+07 3.309509e+06 6.595145e+06 1.089385e+07 NaN 3.508776e+06
302 3.288333e+07 6.953062e+06 1.430190e+07 4.988915e+06 2.310888e+07 1.162495e+07
303 3.332308e+07 1.682790e+07 2.951138e+07 9.474570e+06 2.965893e+07 1.014219e+07
我需要做一个等方差的双尾 t 检验,然后将其添加为最后一列。 或者,如果 SciPy 可以选择仅输入项目数、标准差和平均值,这也可以。
这是我试过的:
group1 = [df1['Pep Ctrl 1'],df1['Pep Ctrl 2'],df1['Pep Ctrl 3'],df1['Pep Ctrl 4'],df1['Pep Ctrl 5']]
group2 = [df2['MCI 1 vs Ctrl normalized'], df2['MCI 2 vs Ctrl normalized'], df2['MCI 3 vs Ctrl normalized'], df2['MCI 4 vs Ctrl normalized'], df2['MCI 5 vs Ctrl normalized']]
ttest = stats.ttest_ind(a=group1,b=group2,axis = 1, equal_var = True)
任何帮助,将不胜感激。
df1
构造函数:
{'Pep Ctrl 1': [47591000.0, 4305900.0, 11466000.0, 11255000.0],
'Pep Ctrl 2': [nan, 2670800.0, 3799400.0, 5402300.0],
'Pep Ctrl 3': [49576000.0, nan, nan, 18337000.0],
'Pep Ctrl 4': [41288000.0, nan, 18552000.0, 19706000.0],
'Pep Ctrl 5': [61727000.0, 7338400.0, 31661000.0, 40286000.0],
'stdev': [8551730.0, 2368407.0, 11841240.0, 13218490.0],
'Ctrl average': [41746750.0, 4170877.0, 15463930.0, 18034130.0]}
df2
构造函数:
{'MCI 1 vs Ctrl normalized': [105404500.0, 10199310.0, 32883330.0, 33323080.0],
'MCI 2 vs Ctrl normalized': [49802060.0, 3309509.0, 6953062.0, 16827900.0],
'MCI 3 vs Ctrl normalized': [47648700.0, 6595145.0, 14301900.0, 29511380.0],
'MCI 4 vs Ctrl normalized': [18342010.0, 10893850.0, 4988915.0, 9474570.0],
'MCI 5 vs Ctrl normalized': [29941240.0, nan, 23108880.0, 29658930.0],
'stdev': [33464730.0, 3508776.0, 11624950.0, 10142190.0]}
您可以使用iterrows
迭代df1
并将每一行与df2
中具有相同索引的相应行进行比较:
from scipy import stats
df2_cols = df2.columns.drop('stdev')
out = [stats.ttest_ind(df2.loc[i, df2_cols], row, equal_var=True, nan_policy='omit')
for i, row in df1.drop(columns=['stdev','Ctrl average']).iterrows()]
Output:
[Ttest_indResult(statistic=0.010483243999151896, pvalue=0.9919282503324176),
Ttest_indResult(statistic=1.2563264347346306, pvalue=0.26449954642964396),
Ttest_indResult(statistic=0.009874028613226149, pvalue=0.9923973079846519),
Ttest_indResult(statistic=0.6390907092148139, pvalue=0.5406265164807074)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.