[英]Pandas Groupby Based on Values in Multiple Columns
I have a dataframe
that I am trying to use pandas.groupby
on to get the cumulative sum.我有一个dataframe
,我正在尝试使用pandas.groupby
来获取累计和。 The values that I am grouping by show up in two different columns, and I am having trouble getting the groupby to work correctly.我分组依据的值显示在两个不同的列中,我无法让分组依据正常工作。 My starting dataframe
is:我的起始dataframe
是:
df = pd.DataFrame({'col_A': ['red', 'red', 'blue', 'red'], 'col_B': ['blue', 'red', 'blue', 'red'], 'col_A_qty': [1, 1, 1, 1], 'col_B_qty': [1, 1, 1, 1]})
col_A col_B col_A_qty col_B_qty
red blue 1 1
red red 1 1
blue blue 1 1
red red 1 1
The result I am trying to get is:我想要得到的结果是:
col_A col_B col_A_qty col_B_qty red_cumsum blue_cumsum
red blue 1 1 1 1
red red 1 1 3 1
blue blue 1 1 3 3
red red 1 1 5 3
I've tried:我试过了:
df.groupby(['col_A', 'col_B'])['col_A_qty'].cumsum()
but this groups on the combination of col_A
and col_B
.但这组基于col_A
和col_B
的组合。 How can I use pandas.groupby
to calculate the cumulative sum of red and blue, regardless of if it's in col_A
or col_B
?我如何使用pandas.groupby
来计算红色和蓝色的累积和,无论它是在col_A
还是col_B
?
Try two pivot
试试两个pivot
out = pd.pivot(df,columns='col_A',values='col_A_qty').fillna(0).cumsum().add(pd.pivot(df,columns='col_B',values='col_B_qty').fillna(0).cumsum(),fill_value=0)
Out[404]:
col_A blue red
0 1.0 1.0
1 1.0 3.0
2 3.0 3.0
3 3.0 5.0
df = df.join(out)
A simple method is to define each cumsum
column by two Series.cumsum
, as follows:一种简单的方法是通过两个Series.cumsum
定义每个cumsum
列,如下所示:
df['red_cumsum'] = df['col_A'].eq('red').cumsum() + df['col_B'].eq('red').cumsum()
df['blue_cumsum'] = df['col_A'].eq('blue').cumsum() + df['col_B'].eq('blue').cumsum()
In each column col_A
and col_B
, check for values equal 'red'
/ 'blue'
(results are boolean series).在每一列col_A
和col_B
中,检查值是否等于'red'
/ 'blue'
(结果为 boolean 系列)。 Then, we use Series.cumsum
on these resultant boolean series to get the cumulative counts.然后,我们对这些结果 boolean 系列使用Series.cumsum
来获得累积计数。 You don't really need to use pandas.groupby
in this use case.在此用例中,您实际上不需要使用pandas.groupby
。
If you have multiple items in col_A
and col_B
, you can also iterate through the unique item list, as follows:如果您在col_A
和col_B
中有多个项目,您还可以遍历唯一项目列表,如下所示:
for item in np.unique(df[['col_A', 'col_B']]):
df[f'{item}_cumsum'] = df['col_A'].eq(item).cumsum() + df['col_B'].eq(item).cumsum()
Result:结果:
print(df)
col_A col_B col_A_qty col_B_qty red_cumsum blue_cumsum
0 red blue 1 1 1 1
1 red red 1 1 3 1
2 blue blue 1 1 3 3
3 red red 1 1 5 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.