简体   繁体   English

Pandas GroupBy - 将函数应用于每个组,同时保留原始顺序

[英]Pandas GroupBy - Applying function to each group while preserving original order

I'm wondering if there's an easy way to apply a function that returns a Series of the same length as a DataFrame, to each group in a DataFrame while preserving the original order of indices.我想知道是否有一种简单的方法可以将返回与 DataFrame 长度相同的 Series 的函数应用于 DataFrame 中的每个组,同时保留索引的原始顺序。

Here's a toy DataFrame which I'll use to give an example:这是一个玩具 DataFrame,我将用它来举例:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.rand(10,2),columns=['x1','x2'])
>>> df['group'] = np.random.choice(list('ABC'),size=10)
>>> df
         x1        x2 group
0  0.710005  0.632971     C
1  0.384604  0.417906     C
2  0.307053  0.869622     C
3  0.699528  0.026040     A
4  0.773514  0.391718     C
5  0.602334  0.936036     C
6  0.872275  0.162393     C
7  0.641256  0.147996     B
8  0.047188  0.358093     C
9  0.059955  0.353174     B

It's easy enough to apply a function which only depends on one column and get back a single sorted Series.应用仅依赖于列的函数并返回单个排序的系列很容易。 For example:例如:

>>> df.groupby('group')['x1'].apply(lambda x: (x-x.mean())/x.std())
0    0.618951
1   -0.488499
2   -0.752430
3         NaN
4    0.835095
5    0.252510
6    1.171211
7    0.707107
8   -1.636838
9   -0.707107

However, if the function depends on multiple columns, the result is a multi-indexed Series that does not preserve order:但是,如果函数依赖于多个列,则结果是一个不保留顺序的多索引系列:

>>> df.groupby('group').apply(lambda grp: grp['x1']/grp['x2'].mean())
group   
A      3    26.863693
B      7     2.559033
       9     0.239262
C      0     1.318752
       1     0.714357
       2     0.570315
       4     1.436714
       5     1.118766
       6     1.620150
       8     0.087646

When the desired output is instead this:当所需的输出是这样的:

>>> res = []
>>> for idx, grp in df.groupby('group'):
...     res.append(grp['x1'] / grp['x2'].mean())
... 
>>> pd.concat(res).sort_index()
0     1.318752
1     0.714357
2     0.570315
3    26.863693
4     1.436714
5     1.118766
6     1.620150
7     2.559033
8     0.087646
9     0.239262

This loop + concat accomplishes what is needed, just wondering if there's a more elegant way using apply .这个循环 + concat 完成了所需的工作,只是想知道是否有更优雅的方式使用apply

I am not sure you need apply here, but always we could use Series.sort_index at the end:我不确定你是否需要在这里apply ,但我们总是可以在最后使用Series.sort_index

df.groupby('group').apply(lambda grp: grp['x1']/grp['x2'].mean()).sort_index(level = 1)
group   
B      0    0.946438
C      1    2.273879
A      2    0.167197
       3    1.378490
C      4    0.320788
       5    0.085125
A      6    1.165615
B      7    1.622586
C      8    1.763416
       9    1.817172
Name: x1, dtype: float64

Method from transform来自transform方法

g=df.groupby('group')
s=(df-g.transform('mean'))/g.transform('std')
Out[33]: 
  group        x1        x2
0   NaN  0.618951  0.332083
1   NaN -0.488498 -0.423041
2   NaN -0.752430  1.162998
3   NaN       NaN       NaN
4   NaN  0.835094 -0.514991
5   NaN  0.252511  1.396187
6   NaN  1.171211 -1.320183
7   NaN  0.707107 -0.707107
8   NaN -1.636838 -0.633053
9   NaN -0.707107  0.707107
s=s.dropna(axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM