[英]Take the sum of every N rows per group in a pandas DataFrame
Just want to mention that this question is not a duplicate of this: Take the sum of every N rows in pandas series只想提一下,这个问题不是这个的重复: Take the sum of every N rows in pandas series
My problem is a bit different as I want to calculate N rows per group .我的问题有点不同,因为我想计算 N rows per group 。 My current code looks like this:
我当前的代码如下所示:
import pandas as pd
df = pd.DataFrame({'ID':['AA','AA','AA','BB','BB','BB'],
'DATE':['2021-01-01','2021-01-03','2021-01-08','2021-03-04','2021-03-06','2021-03-08'],
'VALUE':[10,15,25,40,60,90]})
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.sort_values(by=['ID','DATE'])
df.head(10)
Sample DataFrame:示例数据框:
+----+------------+-------+
| ID | DATE | VALUE |
+----+------------+-------+
| AA | 2021-01-01 | 10 |
+----+------------+-------+
| AA | 2021-01-03 | 15 |
+----+------------+-------+
| AA | 2021-01-08 | 25 |
+----+------------+-------+
| BB | 2021-03-04 | 40 |
+----+------------+-------+
| BB | 2021-03-06 | 60 |
+----+------------+-------+
| BB | 2021-03-08 | 90 |
+----+------------+-------+
I apply this preprocessing based on the post:我根据帖子应用此预处理:
#Calculate result
df.groupby(['ID', df.index//2]).agg({'VALUE':'mean', 'DATE':'median'}).reset_index()
I get this:我明白了:
+----+-------+------------+
| ID | VALUE | DATE |
+----+-------+------------+
| AA | 12.5 | 2021-01-02 |
+----+-------+------------+
| AA | 25 | 2021-01-08 |
+----+-------+------------+
| BB | 40 | 2021-03-04 |
+----+-------+------------+
| BB | 75 | 2021-03-07 |
+----+-------+------------+
But I want this:但我想要这个:
+----+-------+------------+
| ID | VALUE | DATE |
+----+-------+------------+
| AA | 12.5 | 2021-01-02 |
+----+-------+------------+
| AA | 25 | 2021-01-08 |
+----+-------+------------+
| BB | 50 | 2021-03-05 |
+----+-------+------------+
| BB | 90 | 2021-03-08 |
+----+-------+------------+
It seems like pandas index does not work well when my groups are not perfectly aligned and it messes up the beginning of the next series and how aggregations happen.当我的组没有完全对齐并且它弄乱了下一个系列的开头以及聚合如何发生时,熊猫索引似乎无法正常工作。 Any suggestions?
有什么建议么? My dates can be completely irregular by the way.
顺便说一句,我的约会可能完全不规律。
You can use groupby.cumcount
to form a subgroup:您可以使用
groupby.cumcount
来形成一个子组:
N = 2
group = df.groupby('ID').cumcount()//N
out = (df.groupby(['ID', group])
.agg({'VALUE':'mean', 'DATE':'median'})
.droplevel(1).reset_index()
)
output:输出:
ID VALUE DATE
0 AA 12.5 2021-01-02
1 AA 25.0 2021-01-08
2 BB 50.0 2021-03-05
3 BB 90.0 2021-03-08
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.