簡體   English   中英

如何根據 pandas 中的其他列對一列的值求和?

[英]How to sum values of one column based on other columns in pandas?

使用如下所示的 dataframe(文本版本如下): 在此處輸入圖像描述

我應該計算自 2010 年以來哪個國家在錦標賽中的進球最多。 到目前為止,我已經設法通過過濾掉這樣的友誼來操縱 dataframe:

no_friendlies = df[df.tournament != "Friendly"]

然后我將日期列設置為索引,以便過濾掉 2010 年之前的所有匹配項:

no_friendlies_indexed = no_friendlies.set_index('date')
since_2010 = no_friendlies_indexed.loc['2010-01-01':]

從這一點開始,我很迷茫,因為我不知道如何計算每個國家的主客場進球數

任何幫助/建議表示贊賞!

編輯:

示例數據的文本版本:

date    home_team   away_team   home_score  away_score  tournament  city    country     neutral
0   1872-11-30  Scotland    England     0   0       Friendly    Glasgow     Scotland    False
1   1873-03-08  England     Scotland    4   2       Friendly    London  England     False
2   1874-03-07  Scotland    England     2   1       Friendly    Glasgow     Scotland    False
3   1875-03-06  England     Scotland    2   2       Friendly    London  England     False
4   1876-03-04  Scotland    England     3   0       Friendly    Glasgow     Scotland    False
5   1876-03-25  Scotland    Wales       4   0       Friendly    Glasgow     Scotland    False
6   1877-03-03  England     Scotland    1   3       Friendly    London  England     False
7   1877-03-05  Wales       Scotland    0   2       Friendly    Wrexham     Wales   False
8   1878-03-02  Scotland    England     7   2       Friendly    Glasgow     Scotland    False
9   1878-03-23  Scotland    Wales       9   0       Friendly    Glasgow     Scotland    False
10  1879-01-18  England     Wales       2   1       Friendly    London  England     False

編輯2:

我剛剛嘗試過這樣做:

since_2010.groupby(['home_team', 'home_score']).sum()

但它不會返回主隊得分的總和(如果這有效,我會為客隊重復它以獲得總得分)

.groupby.sum()用於主隊,然后對客隊執行相同操作並將兩者相加:

df_new = df.groupby('home_team')['home_score'].sum() + df.groupby('away_team')['away_score'].sum()

output:

England     12
Scotland    34
Wales        1

更詳細的解釋(每條評論):

  1. 您只需要.groupby一列home_team 在您的回答中,您按['home_team', 'home_score']分組您的目標(沒有雙關語)是獲得 home_score 的home_score .sum() - 所以你不應該.groupby .groupby()它。 如您所見['home_score']在我使用.groupby的部分之后,因此我可以獲得它的.sum() 這讓你為主隊做好准備。
  2. 然后,您對away_team執行相同的操作。
  3. 那時 python / pandas 足夠聰明,因為home_teamaway_team組的結果對於國家/地區具有相同的值,您可以簡單地將它們加在一起......

使用pd.wide_to_long重塑。 好處是它會自動創建一個'home_or_away'指標,但我們將首先更改列,使它們成為“score_home”(而不是“home_score”)。

# Swap column stubs around `'_'`
df.columns = ['_'.join(x[::-1]) for x in df.columns.str.split('_')]

# Your code to filter, would drop everything in your provided example
# df['date'] = pd.to_datetime(df['date'])
# df[df['date'].dt.year.gt(2010) & df['tournament'].ne('Friendly')]

df = pd.wide_to_long(df, i='date', j='home_or_away',
                     stubnames=['team', 'score'], sep='_', suffix='.*')

#                          country  neutral tournament     city      team  score
#date       home_or_away                                                        
#1872-11-30 home          Scotland    False   Friendly  Glasgow  Scotland      0
#1873-03-08 home           England    False   Friendly   London   England      4
#1874-03-07 home          Scotland    False   Friendly  Glasgow  Scotland      2
#...
#1878-03-02 away          Scotland    False   Friendly  Glasgow   England      2
#1878-03-23 away          Scotland    False   Friendly  Glasgow     Wales      0
#1879-01-18 away           England    False   Friendly   London     Wales      1

所以現在無論主場還是客場,都可以獲得積分:

df.groupby('team')['score'].sum()
#team
#England     12
#Scotland    34
#Wales        1
#Name: score, dtype: int64

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM