相邻行熊猫的分组条件总和

Question

I have a dataframe, which has been sorted by user and by time我有一个数据框，已按用户和时间排序

 df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
              'location' : ['house','house','gym','gym','shop','gym','gym'], 
              'duration':[10,5,5,4,10,4,6]})


   duration location user
0        10    house    A
1         5    house    A
2         5      gym    A
3         4      gym    B
4        10     shop    B
5         4      gym    B
6         6      gym    B

I only want to do the sum() when 'location' fields are the same across adjacent rows for a given user.我只想在给定用户的相邻行中的“位置”字段相同时执行sum() 。 So it is not just df.groupby(['id','location']).duration.sum() .所以它不仅仅是df.groupby(['id','location']).duration.sum() 。 The desired output will look like the following.所需的输出将如下所示。 In addition, the order is important.此外，顺序很重要。

duration location user
      15    house    A
       5      gym    A
       4      gym    B
      10     shop    B
      10      gym    B

Thank you!谢谢！

Answer 1

Supply sort=False to preserve the ordering between groups like it appeared in the original DF .提供sort=False以保留组之间的顺序，就像它出现在原始DF中一样。 Then, compute the grouped sum of duration column.然后，计算持续时间列的分组总和。

adj_check = (df.location != df.location.shift()).cumsum()
df.groupby(['user', 'location', adj_check], as_index=False, sort=False)['duration'].sum()

The only change that needs to be made to what you've tried before is this condition which groups all the similar successive rows into one unique group:需要对您之前尝试过的内容进行的唯一更改是这种条件，它将所有相似的连续行分组到一个唯一的组中：

(df.location != df.location.shift()).cumsum()
0    1
1    1
2    2
3    2
4    3
5    4
6    4
Name: location, dtype: int32

相邻行熊猫的分组条件总和

问题描述

1 个解决方案

解决方案1
12 2017-01-12 19:06:59

相邻行熊猫的分组条件总和

问题描述

1 个解决方案

解决方案1 12 2017-01-12 19:06:59

解决方案1
12 2017-01-12 19:06:59