[英]How to sum values of one column based on other columns in pandas?
Working with a dataframe that looks like this (text version below):使用如下所示的 dataframe(文本版本如下):
I am supposed to calculate which country has scored the most goals since 2010 in tournaments.我应该计算自 2010 年以来哪个国家在锦标赛中的进球最多。 So far I have managed to manipulate the dataframe by filtering out friendlies like this:
到目前为止,我已经设法通过过滤掉这样的友谊来操纵 dataframe:
no_friendlies = df[df.tournament != "Friendly"]
Then I set the date column to be the index in order to filter out all matches before 2010:然后我将日期列设置为索引,以便过滤掉 2010 年之前的所有匹配项:
no_friendlies_indexed = no_friendlies.set_index('date')
since_2010 = no_friendlies_indexed.loc['2010-01-01':]
I am pretty lost from this point onward as I can't figure out how to sum goals scored by each country both home and away从这一点开始,我很迷茫,因为我不知道如何计算每个国家的主客场进球数
Any help/advice is appreciated!任何帮助/建议表示赞赏!
EDIT:编辑:
Text version of sample data:示例数据的文本版本:
date home_team away_team home_score away_score tournament city country neutral
0 1872-11-30 Scotland England 0 0 Friendly Glasgow Scotland False
1 1873-03-08 England Scotland 4 2 Friendly London England False
2 1874-03-07 Scotland England 2 1 Friendly Glasgow Scotland False
3 1875-03-06 England Scotland 2 2 Friendly London England False
4 1876-03-04 Scotland England 3 0 Friendly Glasgow Scotland False
5 1876-03-25 Scotland Wales 4 0 Friendly Glasgow Scotland False
6 1877-03-03 England Scotland 1 3 Friendly London England False
7 1877-03-05 Wales Scotland 0 2 Friendly Wrexham Wales False
8 1878-03-02 Scotland England 7 2 Friendly Glasgow Scotland False
9 1878-03-23 Scotland Wales 9 0 Friendly Glasgow Scotland False
10 1879-01-18 England Wales 2 1 Friendly London England False
EDIT 2:编辑2:
I have just tried doing this:我刚刚尝试过这样做:
since_2010.groupby(['home_team', 'home_score']).sum()
But it doesn't return the sum of home goals scored by the home teams (if this worked i would just repeat it for away teams to get total)但它不会返回主队得分的总和(如果这有效,我会为客队重复它以获得总得分)
.groupby
and .sum()
for the home team and then do the same for the away team and add the two together: .groupby
和.sum()
用于主队,然后对客队执行相同操作并将两者相加:
df_new = df.groupby('home_team')['home_score'].sum() + df.groupby('away_team')['away_score'].sum()
output: output:
England 12
Scotland 34
Wales 1
More detailed explanation (per comment):更详细的解释(每条评论):
.groupby
one column home_team
..groupby
一列home_team
。 In your answer, you were grouping by ['home_team', 'home_score']
Your goal (no pun intended) is to get the .sum()
of the home_score
-- so you should NOT .groupby()
it.['home_team', 'home_score']
分组您的目标(没有双关语)是获得 home_score 的home_score
.sum()
- 所以你不应该.groupby .groupby()
它。 As you can see ['home_score']
is after the part where I use .groupby
, so that I can get the .sum()
of it.['home_score']
在我使用.groupby
的部分之后,因此我可以获得它的.sum()
。 That gets you set for the home teams.away_team
.away_team
执行相同的操作。home_team
and away_team
groups have the same values for countries, you can simply add them together...home_team
和away_team
组的结果对于国家/地区具有相同的值,您可以简单地将它们加在一起...... Use pd.wide_to_long
to reshape.使用
pd.wide_to_long
重塑。 The benefit is it automatically creates a 'home_or_away'
indicator, but we will first change the columns so that they are 'score_home' (as opposed to 'home_score').好处是它会自动创建一个
'home_or_away'
指标,但我们将首先更改列,使它们成为“score_home”(而不是“home_score”)。
# Swap column stubs around `'_'`
df.columns = ['_'.join(x[::-1]) for x in df.columns.str.split('_')]
# Your code to filter, would drop everything in your provided example
# df['date'] = pd.to_datetime(df['date'])
# df[df['date'].dt.year.gt(2010) & df['tournament'].ne('Friendly')]
df = pd.wide_to_long(df, i='date', j='home_or_away',
stubnames=['team', 'score'], sep='_', suffix='.*')
# country neutral tournament city team score
#date home_or_away
#1872-11-30 home Scotland False Friendly Glasgow Scotland 0
#1873-03-08 home England False Friendly London England 4
#1874-03-07 home Scotland False Friendly Glasgow Scotland 2
#...
#1878-03-02 away Scotland False Friendly Glasgow England 2
#1878-03-23 away Scotland False Friendly Glasgow Wales 0
#1879-01-18 away England False Friendly London Wales 1
So now regardless of home or away, you can get the points scored:所以现在无论主场还是客场,都可以获得积分:
df.groupby('team')['score'].sum()
#team
#England 12
#Scotland 34
#Wales 1
#Name: score, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.