简体   繁体   English

标准化 pandas groupby 结果

[英]standardize pandas groupby results

I am using pandas to get subgroup averages, and the basics work fine.我正在使用 pandas 来获得子组平均值,并且基础工作正常。 For instance,例如,

d = np.array([[1,4],[1,1],[0,1],[1,1]])
m = d.mean(axis=1)

p = pd.DataFrame(m,index='A1,A2,B1,B2'.split(','),columns=['Obs'])
pprint(p)

x = p.groupby([v[0] for v in p.index])
pprint(x.mean('Obs'))

x = p.groupby([v[1] for v in p.index])
pprint(x.mean('Obs'))

YIELDS:产量:

    Obs
A1  2.5
A2  1.0
B1  0.5
B2  1.0

    Obs
A  1.75. <<<< 1.75 is (2.5 + 1.0) / 2
B  0.75

   Obs
1  1.5
2  1.0

But, I also need to know how much A and B (1 and 2) deviate from their common mean.但是,我还需要知道 A 和 B(1 和 2)偏离它们的共同平均值有多少。 That is, I'd like to have tables like:也就是说,我想要这样的表格:

    Obs   Dev
A  1.75  0.50  <<< deviation of the Obs average, i.e., 1.75 - 1.25
B  0.75 -0.50  <<< 0.75 - 1.25 = -0.50

   Obs    Dev
1  1.5   0.25
2  1.0  -0.25

I can do this using loc, apply etc - but this seems silly.我可以使用 loc、apply 等来做到这一点——但这似乎很愚蠢。 Can anyone think of an elegant way to do this using groupby or something similar?谁能想到一种优雅的方式来使用 groupby 或类似的东西来做到这一点?

Aggregate the means, then compute the difference to the mean of means:聚合均值,然后计算均值的差:

(p.groupby(p.index.str[0])
  .agg(Obs=('Obs', 'mean'))
  .assign(Dev=lambda d: d['Obs']-d['Obs'].mean())
)

Or, in case of a variable number of items if you want the difference to the overall mean (not the mean of means:):或者,如果您想要总体平均值的差异(而不是平均值的平均值:),则在项目数量可变的情况下:

(p.groupby(p.index.str[0])
  .agg(Obs=('Obs', 'mean'))
  .assign(Dev=lambda d: d['Obs']-p['Obs'].mean()) # notice the p (not d)
)

output: output:

    Obs  Dev
A  1.75  0.5
B  0.75 -0.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM