[英]How to sum values of one column based on other columns in pandas?
我應該計算自 2010 年以來哪個國家在錦標賽中的進球最多。 到目前為止,我已經設法通過過濾掉這樣的友誼來操縱 dataframe:
no_friendlies = df[df.tournament != "Friendly"]
然后我將日期列設置為索引,以便過濾掉 2010 年之前的所有匹配項:
no_friendlies_indexed = no_friendlies.set_index('date')
since_2010 = no_friendlies_indexed.loc['2010-01-01':]
從這一點開始,我很迷茫,因為我不知道如何計算每個國家的主客場進球數
任何幫助/建議表示贊賞!
編輯:
示例數據的文本版本:
date home_team away_team home_score away_score tournament city country neutral
0 1872-11-30 Scotland England 0 0 Friendly Glasgow Scotland False
1 1873-03-08 England Scotland 4 2 Friendly London England False
2 1874-03-07 Scotland England 2 1 Friendly Glasgow Scotland False
3 1875-03-06 England Scotland 2 2 Friendly London England False
4 1876-03-04 Scotland England 3 0 Friendly Glasgow Scotland False
5 1876-03-25 Scotland Wales 4 0 Friendly Glasgow Scotland False
6 1877-03-03 England Scotland 1 3 Friendly London England False
7 1877-03-05 Wales Scotland 0 2 Friendly Wrexham Wales False
8 1878-03-02 Scotland England 7 2 Friendly Glasgow Scotland False
9 1878-03-23 Scotland Wales 9 0 Friendly Glasgow Scotland False
10 1879-01-18 England Wales 2 1 Friendly London England False
編輯2:
我剛剛嘗試過這樣做:
since_2010.groupby(['home_team', 'home_score']).sum()
但它不會返回主隊得分的總和(如果這有效,我會為客隊重復它以獲得總得分)
.groupby
和.sum()
用於主隊,然后對客隊執行相同操作並將兩者相加:
df_new = df.groupby('home_team')['home_score'].sum() + df.groupby('away_team')['away_score'].sum()
output:
England 12
Scotland 34
Wales 1
更詳細的解釋(每條評論):
.groupby
一列home_team
。 在您的回答中,您按['home_team', 'home_score']
分組您的目標(沒有雙關語)是獲得 home_score 的home_score
.sum()
- 所以你不應該.groupby .groupby()
它。 如您所見['home_score']
在我使用.groupby
的部分之后,因此我可以獲得它的.sum()
。 這讓你為主隊做好准備。away_team
執行相同的操作。home_team
和away_team
組的結果對於國家/地區具有相同的值,您可以簡單地將它們加在一起...... 使用pd.wide_to_long
重塑。 好處是它會自動創建一個'home_or_away'
指標,但我們將首先更改列,使它們成為“score_home”(而不是“home_score”)。
# Swap column stubs around `'_'`
df.columns = ['_'.join(x[::-1]) for x in df.columns.str.split('_')]
# Your code to filter, would drop everything in your provided example
# df['date'] = pd.to_datetime(df['date'])
# df[df['date'].dt.year.gt(2010) & df['tournament'].ne('Friendly')]
df = pd.wide_to_long(df, i='date', j='home_or_away',
stubnames=['team', 'score'], sep='_', suffix='.*')
# country neutral tournament city team score
#date home_or_away
#1872-11-30 home Scotland False Friendly Glasgow Scotland 0
#1873-03-08 home England False Friendly London England 4
#1874-03-07 home Scotland False Friendly Glasgow Scotland 2
#...
#1878-03-02 away Scotland False Friendly Glasgow England 2
#1878-03-23 away Scotland False Friendly Glasgow Wales 0
#1879-01-18 away England False Friendly London Wales 1
所以現在無論主場還是客場,都可以獲得積分:
df.groupby('team')['score'].sum()
#team
#England 12
#Scotland 34
#Wales 1
#Name: score, dtype: int64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.