简体   繁体   English

如何获取 pandas DataFrame 中两列的每一行的 p 值?

[英]How to get p-value for each row of two columns in pandas DataFrame?

I would like to ask for any suggestion how to calculate p-value for each row in my pandas DataFrame.我想请教如何计算我的 pandas DataFrame 中每一行的 p 值。 My dataframe looks like this - there are columns with means of Data1 and Data2, and then also columns with standard error of the means.我的 dataframe 看起来像这样 - 有 Data1 和 Data2 的列,然后还有具有标准误差的列。 Each row represent one atom.每行代表一个原子。 Thus I need calculate p-value for each row (= it means, eg, compare mean of atom 1 from Data1 with mean of atom 1 from Data2).因此,我需要计算每一行的 p 值(= 这意味着,例如,比较来自 Data1 的原子 1 的平均值与来自 Data2 的原子 1 的平均值)。

    SEM-DATA1   MEAN-DATA1  SEM-DATA2   MEAN-DATA2  
0   0.001216    0.145842    0.000959    0.143103    
1   0.002687    0.255069    0.001368    0.250505    
2   0.005267    0.321345    0.003722    0.305767    
3   0.027265    0.906731    0.033637    0.731638    
4   0.029974    0.773725    0.150025    0.960804        

I found here on Stack that many people recommend using scipy.我在 Stack 上发现很多人推荐使用 scipy。 But I dont know how to apply it in the way I need it.但我不知道如何以我需要的方式应用它。 Is it possible?可能吗? Thank You.谢谢你。

You are comparing two samples df['MEAN...1'] and df['MEAN...2'] , so, you should do this:您正在比较两个样本df['MEAN...1']df['MEAN...2'] ,因此,您应该这样做:

from scipy import stats
stats.ttest_ind(df['MEAN-DATA1'],df['MEAN-DATA2'])

which return:返回:

Ttest_indResult(statistic=0.01001479441863673, pvalue=0.9922547232600507)

or if you only want to p-value或者如果你只想 p 值

a = stats.ttest_ind(df['MEAN-DATA1'],df['MEAN-DATA2'])
a[1]

which gives这使

0.9922547232600507

EDIT编辑

A clarification is in order here.这里需要澄清一下。 A t-test (or the aquisition of a "p-value" is aimed at finding out is two distributions are coming from the same population (or sample). Testing for two single values will give NaN . t 检验(或“p 值”的获取旨在找出两个分布来自同一总体(或样本)。测试两个单个值将给出NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何获取 Pandas 中列列表的 p 值和 pearson 的 r? - How to get p-value and pearson's r for a list of columns in Pandas? 如何在pandas中groupby之后获得两组之间的p值? - How to get the p-value between two groups after groupby in pandas? 如何在对两列进行分组并在 Pandas Dataframe 中获取值计数后获得最高值行? - How to get the highest value row after grouping two columns and getting value counts in Pandas Dataframe? 如何获取pandas数据帧的每一行中特定值的频率 - How to get the frequency of a specific value in each row of pandas dataframe 获取 Pandas Dataframe 中每一列的最后一个值 - Get last value of each columns in Pandas Dataframe 如果每列每行有多个值,如何在熊猫数据框中的两列之间创建字典? - How can I create a dictionary between two columns within a pandas dataframe if each column has more than one value per row? 将函数应用于pandas数据帧的每一行以创建两个新列 - Apply function to each row of pandas dataframe to create two new columns 熊猫在每行中获得最高的非空值,在具有可变列数的数据框中 - Pandas get highest non-null value in each row, in dataframe with variable number of columns 如何计算两个浮点列表的p值? - How to calculate p-value for two lists of floats? pandas dataframe 列出每行具有某些值的列 - pandas dataframe list columns having some value for each row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM