取 pandas DataFrame 中每组每 N 行的总和

Question

Just want to mention that this question is not a duplicate of this: Take the sum of every N rows in pandas series只想提一下，这个问题不是这个的重复： Take the sum of every N rows in pandas series

My problem is a bit different as I want to calculate N rows per group .我的问题有点不同，因为我想计算 N rows per group 。 My current code looks like this:我当前的代码如下所示：

import pandas as pd

df = pd.DataFrame({'ID':['AA','AA','AA','BB','BB','BB'], 
                   'DATE':['2021-01-01','2021-01-03','2021-01-08','2021-03-04','2021-03-06','2021-03-08'],
                   'VALUE':[10,15,25,40,60,90]})

df['DATE'] = pd.to_datetime(df['DATE'])
df = df.sort_values(by=['ID','DATE'])
df.head(10)

Sample DataFrame:示例数据框：

+----+------------+-------+
| ID | DATE       | VALUE |
+----+------------+-------+
| AA | 2021-01-01 |   10  |
+----+------------+-------+
| AA | 2021-01-03 |   15  |
+----+------------+-------+
| AA | 2021-01-08 |   25  |
+----+------------+-------+
| BB | 2021-03-04 |   40  |
+----+------------+-------+
| BB | 2021-03-06 |   60  |
+----+------------+-------+
| BB | 2021-03-08 |   90  |
+----+------------+-------+

I apply this preprocessing based on the post:我根据帖子应用此预处理：

#Calculate result
df.groupby(['ID', df.index//2]).agg({'VALUE':'mean', 'DATE':'median'}).reset_index()

I get this:我明白了：

+----+-------+------------+
| ID | VALUE | DATE       |
+----+-------+------------+
| AA | 12.5  | 2021-01-02 |
+----+-------+------------+
| AA | 25    | 2021-01-08 |
+----+-------+------------+
| BB | 40    | 2021-03-04 |
+----+-------+------------+
| BB | 75    | 2021-03-07 |
+----+-------+------------+

But I want this:但我想要这个：

+----+-------+------------+
| ID | VALUE | DATE       |
+----+-------+------------+
| AA | 12.5  | 2021-01-02 |
+----+-------+------------+
| AA | 25    | 2021-01-08 |
+----+-------+------------+
| BB | 50    | 2021-03-05 |
+----+-------+------------+
| BB | 90    | 2021-03-08 |
+----+-------+------------+

It seems like pandas index does not work well when my groups are not perfectly aligned and it messes up the beginning of the next series and how aggregations happen.当我的组没有完全对齐并且它弄乱了下一个系列的开头以及聚合如何发生时，熊猫索引似乎无法正常工作。 Any suggestions?有什么建议么？ My dates can be completely irregular by the way.顺便说一句，我的约会可能完全不规律。

Answer 1

You can use groupby.cumcount to form a subgroup:您可以使用groupby.cumcount来形成一个子组：

N = 2
group = df.groupby('ID').cumcount()//N

out = (df.groupby(['ID', group])
         .agg({'VALUE':'mean', 'DATE':'median'})
         .droplevel(1).reset_index()
      )

output:输出：

   ID  VALUE       DATE
0  AA   12.5 2021-01-02
1  AA   25.0 2021-01-08
2  BB   50.0 2021-03-05
3  BB   90.0 2021-03-08

取 pandas DataFrame 中每组每 N 行的总和

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-07-23 07:56:24

取 pandas DataFrame 中每组每 N 行的总和

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-07-23 07:56:24

解决方案1
2 已采纳 2022-07-23 07:56:24