[英]Pandas Groupby based on several values in a column
I have a Dataframe. 我有一个数据框。 For simplicity, let's assume this is my df:
为了简单起见,我们假设这是我的df:
A B C
1 4 7
1 5 4
1 6 2
What I want to do, is to group by A
and B
, where one group of B
is [4,6]
and the other is 5
. 我想要做的,就是按
A
和B
,其中一组B
是[4,6]
另一种是5
。 Let's say my aggregation function is Sum
on C
so I want the result to be: 假设我的聚合函数是
C
上的Sum
,所以我希望结果是:
A B Sum(C)
1 [4,6] 9
1 5 4
I know I can add an additional column to indicate if the value is in [4,6], but is there a more elegant way? 我知道我可以添加一个附加列以指示该值是否在[4,6]中,但是还有更优雅的方法吗?
Not so easy. 没那么容易。
First I use replace
for groupby by same values and then agg
by custom function and sum
: 首先,我用相同的值
replace
groupby,然后使用自定义函数和sum
进行agg
:
#4 and 6 are same group
d = {4:6}
df = df.groupby(['A',df.B.replace(d)]) \
.agg({'B':lambda x: x.tolist() if len(x) > 1 else x.iat[0], 'C':'sum'}) \
.reset_index(level=1, drop=True) \
.reset_index() \
.reindex_axis(df.columns, axis=1)
print (df)
A B C
0 1 5 4
1 1 [4, 6] 9
If instead list
s can be tuples: 相反,如果
list
可以是元组:
#define groups of all values of column
d = {'a':[5], 'b':[4,6]}
#create tuples
d = {k: tuple(d[oldk]) for oldk, oldv in d.items() for k in oldv}
print (d)
{4: (4, 6), 5: (5,), 6: (4, 6)}
df = df.groupby(['A', df.B.map(d)])['C'].sum().reset_index()
print (df)
A B C
0 1 (4, 6) 9
1 1 (5,) 4
试试df.groupby([“ A”,“ B”])。sum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.