[英]How to get p-value for each row of two columns in pandas DataFrame?
I would like to ask for any suggestion how to calculate p-value for each row in my pandas DataFrame.我想请教如何计算我的 pandas DataFrame 中每一行的 p 值。 My dataframe looks like this - there are columns with means of Data1 and Data2, and then also columns with standard error of the means.
我的 dataframe 看起来像这样 - 有 Data1 和 Data2 的列,然后还有具有标准误差的列。 Each row represent one atom.
每行代表一个原子。 Thus I need calculate p-value for each row (= it means, eg, compare mean of atom 1 from Data1 with mean of atom 1 from Data2).
因此,我需要计算每一行的 p 值(= 这意味着,例如,比较来自 Data1 的原子 1 的平均值与来自 Data2 的原子 1 的平均值)。
SEM-DATA1 MEAN-DATA1 SEM-DATA2 MEAN-DATA2
0 0.001216 0.145842 0.000959 0.143103
1 0.002687 0.255069 0.001368 0.250505
2 0.005267 0.321345 0.003722 0.305767
3 0.027265 0.906731 0.033637 0.731638
4 0.029974 0.773725 0.150025 0.960804
I found here on Stack that many people recommend using scipy.我在 Stack 上发现很多人推荐使用 scipy。 But I dont know how to apply it in the way I need it.
但我不知道如何以我需要的方式应用它。 Is it possible?
可能吗? Thank You.
谢谢你。
You are comparing two samples df['MEAN...1']
and df['MEAN...2']
, so, you should do this:您正在比较两个样本
df['MEAN...1']
和df['MEAN...2']
,因此,您应该这样做:
from scipy import stats
stats.ttest_ind(df['MEAN-DATA1'],df['MEAN-DATA2'])
which return:返回:
Ttest_indResult(statistic=0.01001479441863673, pvalue=0.9922547232600507)
or if you only want to p-value或者如果你只想 p 值
a = stats.ttest_ind(df['MEAN-DATA1'],df['MEAN-DATA2'])
a[1]
which gives这使
0.9922547232600507
EDIT
编辑
A clarification is in order here.这里需要澄清一下。 A t-test (or the aquisition of a "p-value" is aimed at finding out is two distributions are coming from the same population (or sample). Testing for two single values will give
NaN
. t 检验(或“p 值”的获取旨在找出两个分布来自同一总体(或样本)。测试两个单个值将给出
NaN
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.