簡體   English   中英

如何根據來自其他兩列的值的分組總和創建新的值列?

[英]How can I create a new column of values based on the grouped sum of values from two other columns?

首先,如果我的問題措辭不夠好,我深表歉意。

我將使用示例 dataframe 來說明我的問題。

medals = pd.DataFrame({'Year':[2010,2010,2010,2010,2010,2010,2014,2014,2014,2014,2014,2014,2018,2018,2018,2018,2018,2018],'Country': ['Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany','Canada','Canada','USA','USA','Germany','Germany'],'Sex': ['female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male','female','male'],
'No. of medals': [2,4,2,0,3,0,1,1,3,2,4,4,1,3,2,2,1,3]})

假設我有這個國家的 dataframe 和他們在奧運會上獲得的獎牌數量:

    Year  Country     Sex  No. of medals
0   2010   Canada  female              2
1   2010   Canada    male              4
2   2010      USA  female              2
3   2010      USA    male              0
4   2010  Germany  female              3
5   2010  Germany    male              0
6   2014   Canada  female              1
7   2014   Canada    male              1
8   2014      USA  female              3
9   2014      USA    male              2
10  2014  Germany  female              4
11  2014  Germany    male              4
12  2018   Canada  female              1
13  2018   Canada    male              3
14  2018      USA  female              2
15  2018      USA    male              2
16  2018  Germany  female              1
17  2018  Germany    male              3 

假設我想添加一列,顯示該國當年獲得的獎牌總數:

    Year  Country     Sex  No. of medals  Total medals
0   2010   Canada  female              2             6
1   2010   Canada    male              4             6
2   2010      USA  female              2             2
3   2010      USA    male              0             2
4   2010  Germany  female              3             3
5   2010  Germany    male              0             3
6   2014   Canada  female              1             2
7   2014   Canada    male              1             2
8   2014      USA  female              3             5
9   2014      USA    male              2             5
10  2014  Germany  female              4             8
11  2014  Germany    male              4             8
12  2018   Canada  female              1             4
13  2018   Canada    male              3             4
14  2018      USA  female              2             4
15  2018      USA    male              2             4
16  2018  Germany  female              1             4
17  2018  Germany    male              3             4

我將如何 go 這樣做? 我已經按國家和年份分組並得到了總和,但我不確定如何將 map 放到年份和國家列中。

medals.groupby(['Year','Country'])['No. of medals'].sum()

給了我這個:

Year  Country
2010  Canada     6
      Germany    3
      USA        2
2014  Canada     2
      Germany    8
      USA        5
2018  Canada     4
      Germany    4
      USA        4
Name: No. of medals, dtype: int64

非常感謝任何提示和指示。 謝謝!

使用groupby transform

medals['Total medals']=medals.groupby(['Country','Year'])['No. of medals'].transform('sum')
print(medals)




  Year  Country     Sex  No. of medals  Total medals
0   2010   Canada  female              2             6
1   2010   Canada    male              4             6
2   2010      USA  female              2             2
3   2010      USA    male              0             2
4   2010  Germany  female              3             3
5   2010  Germany    male              0             3
6   2014   Canada  female              1             2
7   2014   Canada    male              1             2
8   2014      USA  female              3             5
9   2014      USA    male              2             5
10  2014  Germany  female              4             8
11  2014  Germany    male              4             8
12  2018   Canada  female              1             4
13  2018   Canada    male              3             4
14  2018      USA  female              2             4
15  2018      USA    male              2             4
16  2018  Germany  female              1             4
17  2018  Germany    male              3             4

你幾乎明白了。

>>> medals_sum = medals.groupby(["Year", "Country"])["No. of medals"].sum().reset_index()
>>> medals_sum = medals_sum.rename(columns={"No. of medals": "Total medals"})
>>> medals.merge(medals_sum, on=["Year", "Country"])

    Year  Country     Sex  No. of medals  Total medals
0   2010   Canada  female              2             6
1   2010   Canada    male              4             6
2   2010      USA  female              2             2
3   2010      USA    male              0             2
4   2010  Germany  female              3             3
5   2010  Germany    male              0             3
6   2014   Canada  female              1             2
7   2014   Canada    male              1             2
8   2014      USA  female              3             5
9   2014      USA    male              2             5
10  2014  Germany  female              4             8
11  2014  Germany    male              4             8
12  2018   Canada  female              1             4
13  2018   Canada    male              3             4
14  2018      USA  female              2             4
15  2018      USA    male              2             4
16  2018  Germany  female              1             4
17  2018  Germany    male              3             4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM