[英]How can I create a new column of values based on the grouped sum of values from two other columns?
首先,如果我的問題措辭不夠好,我深表歉意。
我將使用示例 dataframe 來說明我的問題。
medals = pd.DataFrame({'Year':[2010,2010,2010,2010,2010,2010,2014,2014,2014,2014,2014,2014,2018,2018,2018,2018,2018,2018],'Country': ['Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany'],'Sex': ['female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male'],
'No. of medals': [2,4,2,0,3,0,1,1,3,2,4,4,1,3,2,2,1,3]})
假設我有這個國家的 dataframe 和他們在奧運會上獲得的獎牌數量:
Year Country Sex No. of medals
0 2010 Canada female 2
1 2010 Canada male 4
2 2010 USA female 2
3 2010 USA male 0
4 2010 Germany female 3
5 2010 Germany male 0
6 2014 Canada female 1
7 2014 Canada male 1
8 2014 USA female 3
9 2014 USA male 2
10 2014 Germany female 4
11 2014 Germany male 4
12 2018 Canada female 1
13 2018 Canada male 3
14 2018 USA female 2
15 2018 USA male 2
16 2018 Germany female 1
17 2018 Germany male 3
假設我想添加一列,顯示該國當年獲得的獎牌總數:
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
我將如何 go 這樣做? 我已經按國家和年份分組並得到了總和,但我不確定如何將 map 放到年份和國家列中。
medals.groupby(['Year','Country'])['No. of medals'].sum()
給了我這個:
Year Country
2010 Canada 6
Germany 3
USA 2
2014 Canada 2
Germany 8
USA 5
2018 Canada 4
Germany 4
USA 4
Name: No. of medals, dtype: int64
非常感謝任何提示和指示。 謝謝!
使用groupby
transform
medals['Total medals']=medals.groupby(['Country','Year'])['No. of medals'].transform('sum')
print(medals)
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
你幾乎明白了。
>>> medals_sum = medals.groupby(["Year", "Country"])["No. of medals"].sum().reset_index()
>>> medals_sum = medals_sum.rename(columns={"No. of medals": "Total medals"})
>>> medals.merge(medals_sum, on=["Year", "Country"])
Year Country Sex No. of medals Total medals
0 2010 Canada female 2 6
1 2010 Canada male 4 6
2 2010 USA female 2 2
3 2010 USA male 0 2
4 2010 Germany female 3 3
5 2010 Germany male 0 3
6 2014 Canada female 1 2
7 2014 Canada male 1 2
8 2014 USA female 3 5
9 2014 USA male 2 5
10 2014 Germany female 4 8
11 2014 Germany male 4 8
12 2018 Canada female 1 4
13 2018 Canada male 3 4
14 2018 USA female 2 4
15 2018 USA male 2 4
16 2018 Germany female 1 4
17 2018 Germany male 3 4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.