[英]Groupby conditional sum of adjacent rows pandas
I have a dataframe, which has been sorted by user and by time我有一个数据框,已按用户和时间排序
df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
'location' : ['house','house','gym','gym','shop','gym','gym'],
'duration':[10,5,5,4,10,4,6]})
duration location user
0 10 house A
1 5 house A
2 5 gym A
3 4 gym B
4 10 shop B
5 4 gym B
6 6 gym B
I only want to do the sum()
when 'location' fields are the same across adjacent rows for a given user.我只想在给定用户的相邻行中的“位置”字段相同时执行sum()
。 So it is not just df.groupby(['id','location']).duration.sum()
.所以它不仅仅是df.groupby(['id','location']).duration.sum()
。 The desired output will look like the following.所需的输出将如下所示。 In addition, the order is important.此外,顺序很重要。
duration location user
15 house A
5 gym A
4 gym B
10 shop B
10 gym B
Thank you!谢谢!
Supply sort=False
to preserve the ordering between groups like it appeared in the original DF
.提供sort=False
以保留组之间的顺序,就像它出现在原始DF
中一样。 Then, compute the grouped sum of duration column.然后,计算持续时间列的分组总和。
adj_check = (df.location != df.location.shift()).cumsum()
df.groupby(['user', 'location', adj_check], as_index=False, sort=False)['duration'].sum()
The only change that needs to be made to what you've tried before is this condition which groups all the similar successive rows into one unique group:需要对您之前尝试过的内容进行的唯一更改是这种条件,它将所有相似的连续行分组到一个唯一的组中:
(df.location != df.location.shift()).cumsum()
0 1
1 1
2 2
3 2
4 3
5 4
6 4
Name: location, dtype: int32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.