繁体   English   中英

熊猫groupby sum

[英]Pandas groupby sum

我有一个数据框,如下所示:

ref, type, amount
001, foo, 10
001, foo, 5
001, bar, 50
001, bar, 5
001, test, 100
001, test, 90
002, foo, 20
002, foo, 35
002, bar, 75
002, bar, 80
002, test, 150
002, test, 110

这就是我想要得到的:

ref, type, amount, foo, bar, test
001, foo, 10, 15, 55, 190
001, foo, 5, 15, 55, 190
001, bar, 50, 15, 55, 190
001, bar, 5, 15, 55, 190
001, test, 100, 15, 55, 190
001, test, 90, 15, 55, 190
002, foo, 20, 55, 155, 260
002, foo, 35, 55, 155, 260
002, bar, 75, 55, 155, 260
002, bar, 80, 55, 155, 260
002, test, 150, 55, 155, 260
002, test, 110, 55, 155, 260

所以我有这个:

df.groupby('ref')['amount'].transform(sum)

但是,如何过滤它,使其仅适用于type=foobartest

使用数据透视表的解决方案:

>>> b = pd.pivot_table(df, values='amount', index=['ref'], columns=['type'], aggfunc=np.sum)
>>> b
type  bar  foo  test
ref
1      55   15   190
2     155   55   260

>>> pd.merge(df, b, left_on='ref', right_index=True)
    ref  type  amount  bar  foo  test
0     1   foo      10   55   15   190
1     1   foo       5   55   15   190
2     1   bar      50   55   15   190
3     1   bar       5   55   15   190
4     1  test     100   55   15   190
5     1  test      90   55   15   190
6     2   foo      20  155   55   260
7     2   foo      35  155   55   260
8     2   bar      75  155   55   260
9     2   bar      80  155   55   260
10    2  test     150  155   55   260
11    2  test     110  155   55   260

我认为你需要groupbyunstack ,然后merge到原始DataFrame

df1 = df.groupby(['ref','type'])['amount'].sum().unstack().reset_index()
print (df1)
type  ref  bar  foo  test
0     001   55   15   190
1     002  155   55   260

df = pd.merge(df, df1, on='ref')
print (df)
    ref  type  amount  sums  bar  foo  test
0   001   foo      10    15   55   15   190
1   001   foo       5    15   55   15   190
2   001   bar      50    55   55   15   190
3   001   bar       5    55   55   15   190
4   001  test     100   190   55   15   190
5   001  test      90   190   55   15   190
6   002   foo      20    55  155   55   260
7   002   foo      35    55  155   55   260
8   002   bar      75   155  155   55   260
9   002   bar      80   155  155   55   260
10  002  test     150   260  155   55   260
11  002  test     110   260  155   55   260

时间

In [506]: %timeit (pd.merge(df, df.groupby(['ref','type'])['amount'].sum().unstack().reset_index(), on='ref'))
100 loops, best of 3: 3.4 ms per loop

In [507]: %timeit (pd.merge(df, pd.pivot_table(df, values='amount', index=['ref'], columns=['type'], aggfunc=np.sum), left_on='ref', right_index=True))
100 loops, best of 3: 4.99 ms per loop

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM