简体   繁体   English

Pandas Groupby基于列中的多个值

[英]Pandas Groupby based on several values in a column

I have a Dataframe. 我有一个数据框。 For simplicity, let's assume this is my df: 为了简单起见,我们假设这是我的df:

A B C
1 4 7
1 5 4
1 6 2

What I want to do, is to group by A and B , where one group of B is [4,6] and the other is 5 . 我想要做的,就是按AB ,其中一组B[4,6]另一种是5 Let's say my aggregation function is Sum on C so I want the result to be: 假设我的聚合函数是C上的Sum ,所以我希望结果是:

A   B  Sum(C)
1 [4,6]  9
1   5    4

I know I can add an additional column to indicate if the value is in [4,6], but is there a more elegant way? 我知道我可以添加一个附加列以指示该值是否在[4,6]中,但是还有更优雅的方法吗?

Not so easy. 没那么容易。

First I use replace for groupby by same values and then agg by custom function and sum : 首先,我用相同的值replace groupby,然后使用自定义函数和sum进行agg

#4 and 6 are same group
d = {4:6}
df = df.groupby(['A',df.B.replace(d)]) \
       .agg({'B':lambda x: x.tolist() if len(x) > 1 else x.iat[0], 'C':'sum'}) \
       .reset_index(level=1, drop=True) \
       .reset_index() \
       .reindex_axis(df.columns, axis=1)
print (df)
   A       B  C
0  1       5  4
1  1  [4, 6]  9

If instead list s can be tuples: 相反,如果list可以是元组:

#define groups of all values of column
d = {'a':[5], 'b':[4,6]}
#create tuples
d = {k: tuple(d[oldk]) for oldk, oldv in d.items() for k in oldv}
print (d)
{4: (4, 6), 5: (5,), 6: (4, 6)}

df = df.groupby(['A', df.B.map(d)])['C'].sum().reset_index()
print (df)
   A       B  C
0  1  (4, 6)  9
1  1    (5,)  4

试试df.groupby([“ A”,“ B”])。sum()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM