简体   繁体   English

如何根据 pandas 中的其他列对一列的值求和?

[英]How to sum values of one column based on other columns in pandas?

Working with a dataframe that looks like this (text version below):使用如下所示的 dataframe(文本版本如下): 在此处输入图像描述

I am supposed to calculate which country has scored the most goals since 2010 in tournaments.我应该计算自 2010 年以来哪个国家在锦标赛中的进球最多。 So far I have managed to manipulate the dataframe by filtering out friendlies like this:到目前为止,我已经设法通过过滤掉这样的友谊来操纵 dataframe:

no_friendlies = df[df.tournament != "Friendly"]

Then I set the date column to be the index in order to filter out all matches before 2010:然后我将日期列设置为索引,以便过滤掉 2010 年之前的所有匹配项:

no_friendlies_indexed = no_friendlies.set_index('date')
since_2010 = no_friendlies_indexed.loc['2010-01-01':]

I am pretty lost from this point onward as I can't figure out how to sum goals scored by each country both home and away从这一点开始,我很迷茫,因为我不知道如何计算每个国家的主客场进球数

Any help/advice is appreciated!任何帮助/建议表示赞赏!

EDIT:编辑:

Text version of sample data:示例数据的文本版本:

date    home_team   away_team   home_score  away_score  tournament  city    country     neutral
0   1872-11-30  Scotland    England     0   0       Friendly    Glasgow     Scotland    False
1   1873-03-08  England     Scotland    4   2       Friendly    London  England     False
2   1874-03-07  Scotland    England     2   1       Friendly    Glasgow     Scotland    False
3   1875-03-06  England     Scotland    2   2       Friendly    London  England     False
4   1876-03-04  Scotland    England     3   0       Friendly    Glasgow     Scotland    False
5   1876-03-25  Scotland    Wales       4   0       Friendly    Glasgow     Scotland    False
6   1877-03-03  England     Scotland    1   3       Friendly    London  England     False
7   1877-03-05  Wales       Scotland    0   2       Friendly    Wrexham     Wales   False
8   1878-03-02  Scotland    England     7   2       Friendly    Glasgow     Scotland    False
9   1878-03-23  Scotland    Wales       9   0       Friendly    Glasgow     Scotland    False
10  1879-01-18  England     Wales       2   1       Friendly    London  England     False

EDIT 2:编辑2:

I have just tried doing this:我刚刚尝试过这样做:

since_2010.groupby(['home_team', 'home_score']).sum()

But it doesn't return the sum of home goals scored by the home teams (if this worked i would just repeat it for away teams to get total)但它不会返回主队得分的总和(如果这有效,我会为客队重复它以获得总得分)

.groupby and .sum() for the home team and then do the same for the away team and add the two together: .groupby.sum()用于主队,然后对客队执行相同操作并将两者相加:

df_new = df.groupby('home_team')['home_score'].sum() + df.groupby('away_team')['away_score'].sum()

output: output:

England     12
Scotland    34
Wales        1

More detailed explanation (per comment):更详细的解释(每条评论):

  1. You need to only .groupby one column home_team .您只需要.groupby一列home_team In your answer, you were grouping by ['home_team', 'home_score'] Your goal (no pun intended) is to get the .sum() of the home_score -- so you should NOT .groupby() it.在您的回答中,您按['home_team', 'home_score']分组您的目标(没有双关语)是获得 home_score 的home_score .sum() - 所以你不应该.groupby .groupby()它。 As you can see ['home_score'] is after the part where I use .groupby , so that I can get the .sum() of it.如您所见['home_score']在我使用.groupby的部分之后,因此我可以获得它的.sum() That gets you set for the home teams.这让你为主队做好准备。
  2. Then, you do the same for the away_team .然后,您对away_team执行相同的操作。
  3. At that point python / pandas is smart enough that since the results of the home_team and away_team groups have the same values for countries, you can simply add them together...那时 python / pandas 足够聪明,因为home_teamaway_team组的结果对于国家/地区具有相同的值,您可以简单地将它们加在一起......

Use pd.wide_to_long to reshape.使用pd.wide_to_long重塑。 The benefit is it automatically creates a 'home_or_away' indicator, but we will first change the columns so that they are 'score_home' (as opposed to 'home_score').好处是它会自动创建一个'home_or_away'指标,但我们将首先更改列,使它们成为“score_home”(而不是“home_score”)。

# Swap column stubs around `'_'`
df.columns = ['_'.join(x[::-1]) for x in df.columns.str.split('_')]

# Your code to filter, would drop everything in your provided example
# df['date'] = pd.to_datetime(df['date'])
# df[df['date'].dt.year.gt(2010) & df['tournament'].ne('Friendly')]

df = pd.wide_to_long(df, i='date', j='home_or_away',
                     stubnames=['team', 'score'], sep='_', suffix='.*')

#                          country  neutral tournament     city      team  score
#date       home_or_away                                                        
#1872-11-30 home          Scotland    False   Friendly  Glasgow  Scotland      0
#1873-03-08 home           England    False   Friendly   London   England      4
#1874-03-07 home          Scotland    False   Friendly  Glasgow  Scotland      2
#...
#1878-03-02 away          Scotland    False   Friendly  Glasgow   England      2
#1878-03-23 away          Scotland    False   Friendly  Glasgow     Wales      0
#1879-01-18 away           England    False   Friendly   London     Wales      1

So now regardless of home or away, you can get the points scored:所以现在无论主场还是客场,都可以获得积分:

df.groupby('team')['score'].sum()
#team
#England     12
#Scotland    34
#Wales        1
#Name: score, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:如何根据其他列值的条件对列求和? - Pandas: How to sum columns based on conditional of other column values? Pandas:如何根据其他列值的条件创建对其他列求和的列? - Pandas: How create columns where sum other columns based on conditional of other column values? 基于一列分组并获得其他列熊猫的唯一性和总和 - Group BY based on one column and get unique and sum of other columns pandas 根据 pandas dataframe 中的其他三列更改一列的值 - Changing values of one column based on the other three columns in pandas dataframe 熊猫如何根据其他列中的值汇总一列的总和 - pandas how to aggregate sum on a column depending on values in other columns 如何根据 DataFrame Python Pandas 中其他 2 列中的值删除一列中的重复项? - How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas? 如何将基于其他列值的列附加到pandas数据框 - How to append columns based on other column values to pandas dataframe 如何基于熊猫中其他列条件对列的某些值求平均值 - How to average certain values of a column based on other columns condition in pandas 如何根据pandas中其他列的值计算新列 - python - how to compute a new column based on the values of other columns in pandas - python Pandas - 如何根据其他列值移动列 - Pandas - How to shift a column based on other columns values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM