Pandas Groupby基于列中的多个值

Question

I have a Dataframe. 我有一个数据框。 For simplicity, let's assume this is my df: 为了简单起见，我们假设这是我的df：

What I want to do, is to group by A and B , where one group of B is [4,6] and the other is 5 . 我想要做的，就是按A和B ，其中一组B是[4,6]另一种是5 。 Let's say my aggregation function is Sum on C so I want the result to be: 假设我的聚合函数是C上的Sum ，所以我希望结果是：

A   B  Sum(C)
1 [4,6]  9
1   5    4

I know I can add an additional column to indicate if the value is in [4,6], but is there a more elegant way? 我知道我可以添加一个附加列以指示该值是否在[4,6]中，但是还有更优雅的方法吗？

Answer 1

Not so easy. 没那么容易。

First I use replace for groupby by same values and then agg by custom function and sum : 首先，我用相同的值replace groupby，然后使用自定义函数和sum进行agg ：

#4 and 6 are same group
d = {4:6}
df = df.groupby(['A',df.B.replace(d)]) \
       .agg({'B':lambda x: x.tolist() if len(x) > 1 else x.iat[0], 'C':'sum'}) \
       .reset_index(level=1, drop=True) \
       .reset_index() \
       .reindex_axis(df.columns, axis=1)
print (df)
   A       B  C
0  1       5  4
1  1  [4, 6]  9

If instead list s can be tuples: 相反，如果list可以是元组：

#define groups of all values of column
d = {'a':[5], 'b':[4,6]}
#create tuples
d = {k: tuple(d[oldk]) for oldk, oldv in d.items() for k in oldv}
print (d)
{4: (4, 6), 5: (5,), 6: (4, 6)}

df = df.groupby(['A', df.B.map(d)])['C'].sum().reset_index()
print (df)
   A       B  C
0  1  (4, 6)  9
1  1    (5,)  4

Answer 2

试试df.groupby（[“ A”，“ B”]）。sum（）

Pandas Groupby基于列中的多个值

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-07-23 08:23:24

解决方案2
-1 2017-07-23 08:16:32

Pandas Groupby基于列中的多个值

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-07-23 08:23:24

解决方案2 -1 2017-07-23 08:16:32

解决方案1
1 已采纳 2017-07-23 08:23:24

解决方案2
-1 2017-07-23 08:16:32