Pandas t 检验使用行作为 arrays

Question

I need to find a way to calculate a p-value for two sets of data, comparing each row in one DataFrame with the accompanying row in another DataFrame. For example, array1 would be the five items in row 300 (not including stdev and Ctrl average), and same for array2 with the five items in row 300.我需要找到一种方法来计算两组数据的 p 值，将一个 DataFrame 中的每一行与另一个 DataFrame 中的相应行进行比较。例如，array1 将是第 300 行中的五个项目（不包括 stdev 和 Ctrl平均值），对于第 300 行中的五个项目的 array2 也是如此。

df1:

       Pep Ctrl 1  Pep Ctrl 2   Pep Ctrl 3  Pep Ctrl 4  Pep Ctrl 5         stdev  Ctrl average
300    47591000.0         NaN   49576000.0  41288000.0  61727000.0  8.551730e+06  4.174675e+07
301     4305900.0   2670800.0          NaN         NaN   7338400.0  2.368407e+06  4.170877e+06
302    11466000.0   3799400.0          NaN  18552000.0  31661000.0  1.184124e+07  1.546393e+07
303    11255000.0   5402300.0   18337000.0  19706000.0  40286000.0  1.321849e+07  1.803413e+07

df2:

      MCI 1 vs Ctrl normalized  MCI 2 vs Ctrl normalized  MCI 3 vs Ctrl normalized  MCI 4 vs Ctrl normalized  MCI 5 vs Ctrl normalized         stdev
300               1.054045e+08              4.980206e+07              4.764870e+07              1.834201e+07              2.994124e+07  3.346473e+07
301               1.019931e+07              3.309509e+06              6.595145e+06              1.089385e+07                       NaN  3.508776e+06
302               3.288333e+07              6.953062e+06              1.430190e+07              4.988915e+06              2.310888e+07  1.162495e+07
303               3.332308e+07              1.682790e+07              2.951138e+07              9.474570e+06              2.965893e+07  1.014219e+07

I need to do a two-tailed t test with equal variances, and then add this as the last column.我需要做一个等方差的双尾 t 检验，然后将其添加为最后一列。 Alternatively, if SciPy has an option to just input the number of items, standard deviation, and average, this could also work.或者，如果 SciPy 可以选择仅输入项目数、标准差和平均值，这也可以。

This is what I tried:这是我试过的：

group1 = [df1['Pep Ctrl 1'],df1['Pep Ctrl 2'],df1['Pep Ctrl 3'],df1['Pep Ctrl 4'],df1['Pep Ctrl 5']]
group2 = [df2['MCI 1 vs Ctrl normalized'], df2['MCI 2 vs Ctrl normalized'], df2['MCI 3 vs Ctrl normalized'], df2['MCI 4 vs Ctrl normalized'], df2['MCI 5 vs Ctrl normalized']]
ttest = stats.ttest_ind(a=group1,b=group2,axis = 1, equal_var = True)

Any help would be appreciated.任何帮助，将不胜感激。

df1 constructor: df1构造函数：

{'Pep Ctrl 1': [47591000.0, 4305900.0, 11466000.0, 11255000.0],
 'Pep Ctrl 2': [nan, 2670800.0, 3799400.0, 5402300.0],
 'Pep Ctrl 3': [49576000.0, nan, nan, 18337000.0],
 'Pep Ctrl 4': [41288000.0, nan, 18552000.0, 19706000.0],
 'Pep Ctrl 5': [61727000.0, 7338400.0, 31661000.0, 40286000.0],
 'stdev': [8551730.0, 2368407.0, 11841240.0, 13218490.0],
 'Ctrl average': [41746750.0, 4170877.0, 15463930.0, 18034130.0]}

df2 constructor: df2构造函数：

{'MCI 1 vs Ctrl normalized': [105404500.0, 10199310.0, 32883330.0, 33323080.0],
 'MCI 2 vs Ctrl normalized': [49802060.0, 3309509.0, 6953062.0, 16827900.0],
 'MCI 3 vs Ctrl normalized': [47648700.0, 6595145.0, 14301900.0, 29511380.0],
 'MCI 4 vs Ctrl normalized': [18342010.0, 10893850.0, 4988915.0, 9474570.0],
 'MCI 5 vs Ctrl normalized': [29941240.0, nan, 23108880.0, 29658930.0],
 'stdev': [33464730.0, 3508776.0, 11624950.0, 10142190.0]}

Answer 1

You could use iterrows to iterate over df1 and compare each row with a corresponding row in df2 with the same index:您可以使用iterrows迭代df1并将每一行与df2中具有相同索引的相应行进行比较：

from scipy import stats
df2_cols = df2.columns.drop('stdev')
out = [stats.ttest_ind(df2.loc[i, df2_cols], row, equal_var=True, nan_policy='omit') 
       for i, row in df1.drop(columns=['stdev','Ctrl average']).iterrows()]

Output: Output：

[Ttest_indResult(statistic=0.010483243999151896, pvalue=0.9919282503324176), 
 Ttest_indResult(statistic=1.2563264347346306, pvalue=0.26449954642964396), 
 Ttest_indResult(statistic=0.009874028613226149, pvalue=0.9923973079846519), 
 Ttest_indResult(statistic=0.6390907092148139, pvalue=0.5406265164807074)]

Pandas t 检验使用行作为 arrays

问题描述

1 个解决方案

解决方案1
0

Pandas t 检验使用行作为 arrays

问题描述

1 个解决方案

解决方案1 0

解决方案1
0