转换数据框以进行statsmodels t检验

Question

I'm trying to run a t-test in pandas/statsmodels to compare differences in performance between two groups, but I'm having difficulty formatting the data in a way that statsmodels can use (in a reasonable way). 我试图在pandas / statsmodels中运行t检验以比较两组之间的性能差异，但是我很难以statsmodels可以使用的方式（以合理的方式）格式化数据。

My pandas dataframe currently looks like this: 我的熊猫数据框当前如下所示：

Treatment      Performance
a              2
b              3
a              2
a              1
b              0

And it's my understanding that to perform a t-test I need the data organized by treatment, like so: 据我了解，要执行t检验，我需要按处理方式整理数据，如下所示：

TreatmentA    TreatmentB
2             3
2             0
1

This code almost does the trick: 这段代码几乎可以解决问题：

cat1 = df.groupby('Treatment', as_index=False).groups['a']
cat2 = df.groupby('Treatment', as_index=False).groups['b']
print(ttest_ind(cat1, cat2))

But when I print, it looks like it's pulling the indices where that treatment occurred instead of the performance values: 但是，当我打印时，看起来好像在拉扯发生处理的索引而不是性能值：

print(cat1)
[0, 2, 4, 5, 9, 10, 11, 16, 18,...131, 133, 142, 147, 152, 153, 156, 157, 158]

It [maybe?] needs to be something more like this: [也许？]应该更像这样：

print(cat1)
[2, 2, 1, ...0, 3, 1, 1, 0, 2, 0, 0, 0]

What is the best way to convert this dataframe into a format that I can perform t-tests on? 将这个数据框转换为可以执行t检验的格式的最佳方法是什么？

Answer 1

I think the simplest way is to do it like this: 我认为最简单的方法是这样做：

ttest_ind(df[df['Treatment'] == 'a']['Performance'], df[df['Treatment'] == 'b']['Performance'])

Hope it helps. 希望能帮助到你。

转换数据框以进行statsmodels t检验

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-04-09 09:35:07

转换数据框以进行statsmodels t检验

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-04-09 09:35:07

解决方案1
0 已采纳 2015-04-09 09:35:07