[英]How can I create a new column of values based on the grouped sum of values from two other columns?
首先,如果我的问题措辞不够好,我深表歉意。
我将使用示例 dataframe 来说明我的问题。
medals = pd.DataFrame({'Year':[2010,2010,2010,2010,2010,2010,2014,2014,2014,2014,2014,2014,2018,2018,2018,2018,2018,2018],'Country': ['Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany'],'Sex': ['female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male'],
'No. of medals': [2,4,2,0,3,0,1,1,3,2,4,4,1,3,2,2,1,3]})
假设我有这个国家的 dataframe 和他们在奥运会上获得的奖牌数量:
Year Country Sex No. of medals
0 2010 Canada female 2
1 2010 Canada male 4
2 2010 USA female 2
3 2010 USA male 0
4 2010 Germany female 3
5 2010 Germany male 0
6 2014 Canada female 1
7 2014 Canada male 1
8 2014 USA female 3
9 2014 USA male 2
10 2014 Germany female 4
11 2014 Germany male 4
12 2018 Canada female 1
13 2018 Canada male 3
14 2018 USA female 2
15 2018 USA male 2
16 2018 Germany female 1
17 2018 Germany male 3
假设我想添加一列,显示该国当年获得的奖牌总数:
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
我将如何 go 这样做? 我已经按国家和年份分组并得到了总和,但我不确定如何将 map 放到年份和国家列中。
medals.groupby(['Year','Country'])['No. of medals'].sum()
给了我这个:
Year Country
2010 Canada 6
Germany 3
USA 2
2014 Canada 2
Germany 8
USA 5
2018 Canada 4
Germany 4
USA 4
Name: No. of medals, dtype: int64
非常感谢任何提示和指示。 谢谢!
使用groupby
transform
medals['Total medals']=medals.groupby(['Country','Year'])['No. of medals'].transform('sum')
print(medals)
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
你几乎明白了。
>>> medals_sum = medals.groupby(["Year", "Country"])["No. of medals"].sum().reset_index()
>>> medals_sum = medals_sum.rename(columns={"No. of medals": "Total medals"})
>>> medals.merge(medals_sum, on=["Year", "Country"])
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.