![](/img/trans.png)
[英]Normalize column in pandas dataframe by sum of grouped values of another column
[英]sum values in column grouped by another column pandas
我的 df 看起来像这样:
country id x y
AT 11 50 100
AT 12 NaN 90
AT 13 NaN 104
AT 22 40 50
AT 23 30 23
AT 61 40 88
AT 62 NaN 78
UK 11 40 34
UK 12 NaN 22
UK 13 NaN 70
我需要的是第一行中 y 列的总和,它不是 x 中的 NaN,按列 id 左侧的第一个数字分组。 这对每个国家都是分开的。 最后,我只需要删除 NaN。
结果应该是这样的:
country id x y
AT 11 50 294
AT 22 40 50
AT 23 30 23
AT 61 40 166
UK 11 40 126
您可以通过GroupBy.agg
by first
和sum
函数与 helper Series
通过比较 Series.notna 的非缺失值和Series.notna
的累积总和来Series.cumsum
:
df1 = (df.groupby(['country', df['x'].notna().cumsum()])
.agg({'id':'first', 'x':'first', 'y':'sum'})
.reset_index(level=1, drop=True)
.reset_index())
print (df1)
country id x y
0 AT 11 50.0 294
1 AT 22 40.0 50
2 AT 23 30.0 23
3 AT 61 40.0 166
4 UK 11 40.0 126
如果可能x
的第一个值是错误值,请添加DataFrame.dropna
:
print (df)
country id x y
0 AT 11 NaN 100
1 AT 11 50.0 100
2 AT 12 NaN 90
3 AT 13 NaN 104
4 AT 22 40.0 50
5 AT 23 30.0 23
6 AT 61 40.0 88
7 AT 62 NaN 78
8 UK 11 40.0 34
9 UK 12 NaN 22
10 UK 13 NaN 70
df1 = (df.groupby(['country', df['x'].notna().cumsum()])
.agg({'id':'first', 'x':'first', 'y':'sum'})
.reset_index(level=1, drop=True)
.reset_index()
.dropna(subset=['x']))
print (df1)
country id x y
1 AT 11 50.0 294
2 AT 22 40.0 50
3 AT 23 30.0 23
4 AT 61 40.0 166
5 UK 11 40.0 126
使用groupby
、 transform
和dropna
:
print (df.assign(y=df.groupby(df["x"].notnull().cumsum())["y"].transform('sum'))
.dropna(subset=["x"]))
country id x y
0 AT 11 50.0 294
3 AT 22 40.0 50
4 AT 23 30.0 23
5 AT 61 40.0 166
7 UK 11 40.0 126
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.