标准化 pandas groupby 结果

Question

I am using pandas to get subgroup averages, and the basics work fine.我正在使用 pandas 来获得子组平均值，并且基础工作正常。 For instance,例如，

d = np.array([[1,4],[1,1],[0,1],[1,1]])
m = d.mean(axis=1)

p = pd.DataFrame(m,index='A1,A2,B1,B2'.split(','),columns=['Obs'])
pprint(p)

x = p.groupby([v[0] for v in p.index])
pprint(x.mean('Obs'))

x = p.groupby([v[1] for v in p.index])
pprint(x.mean('Obs'))

YIELDS:产量：

    Obs
A1  2.5
A2  1.0
B1  0.5
B2  1.0

    Obs
A  1.75. <<<< 1.75 is (2.5 + 1.0) / 2
B  0.75

   Obs
1  1.5
2  1.0

But, I also need to know how much A and B (1 and 2) deviate from their common mean.但是，我还需要知道 A 和 B（1 和 2）偏离它们的共同平均值有多少。 That is, I'd like to have tables like:也就是说，我想要这样的表格：

    Obs   Dev
A  1.75  0.50  <<< deviation of the Obs average, i.e., 1.75 - 1.25
B  0.75 -0.50  <<< 0.75 - 1.25 = -0.50

   Obs    Dev
1  1.5   0.25
2  1.0  -0.25

I can do this using loc, apply etc - but this seems silly.我可以使用 loc、apply 等来做到这一点——但这似乎很愚蠢。 Can anyone think of an elegant way to do this using groupby or something similar?谁能想到一种优雅的方式来使用 groupby 或类似的东西来做到这一点？

Answer 1

Aggregate the means, then compute the difference to the mean of means:聚合均值，然后计算均值的差：

(p.groupby(p.index.str[0])
  .agg(Obs=('Obs', 'mean'))
  .assign(Dev=lambda d: d['Obs']-d['Obs'].mean())
)

Or, in case of a variable number of items if you want the difference to the overall mean (not the mean of means:):或者，如果您想要总体平均值的差异（而不是平均值的平均值：），则在项目数量可变的情况下：

(p.groupby(p.index.str[0])
  .agg(Obs=('Obs', 'mean'))
  .assign(Dev=lambda d: d['Obs']-p['Obs'].mean()) # notice the p (not d)
)

output: output：

    Obs  Dev
A  1.75  0.5
B  0.75 -0.5

标准化 pandas groupby 结果

问题描述

1 个解决方案

解决方案1
2 2022-09-09 15:31:10

标准化 pandas groupby 结果

问题描述

1 个解决方案

解决方案1 2 2022-09-09 15:31:10

解决方案1
2 2022-09-09 15:31:10