Pandas groupby() on one column and then sum on another

Question

I have a data frame with a number of columns but with three I am interested in. These are name , year and goals_scored . None of these columns are unique in that for example I have lines like the following:

Name           Year     Goals_scored
John Smith     2014     3
John Smith     2014     2
John Smith     2014     0
John Smith     2015     1
John Smith     2015     1
John Smith     2015     2
John Smith     2015     1
John Smith     2015     0
John Smith     2016     1
John Smith     2016     0

What I am trying to do is create a new data frame where I have 4 columns. One for name, then one for each of the years 2014, 2015 and 2016. The last three columns being a sum of the goals_scored for the year in question. So using the data above it would look like:

Name          2014     2015     2016
John Smith    5        5        1

To make it worse they only want it to include those names that have something for all three years.

Can anyone point me in the right direction?

Answer 1

Need groupby , aggregate sum and reshape by unstack :

df = df.groupby(['Name','Year'])['Goals_scored'].sum().unstack()
print (df)
Year        2014  2015  2016
Name                        
John Smith     5     5     1

Alternative pivot_table :

df = df.pivot_table(index='Name',columns='Year', values='Goals_scored', aggfunc='sum')
print (df)
Year        2014  2015  2016
Name                        
John Smith     5     5     1

Last for column from index:

df = df.reset_index().rename_axis(None, 1)
print (df)
         Name  2014  2015  2016
0  John Smith     5     5     1

Pandas groupby() on one column and then sum on another

Question

1 answers

solution1
5 ACCPTED 2017-10-17 11:06:44

Pandas groupby() on one column and then sum on another

Question

1 answers

solution1 5 ACCPTED 2017-10-17 11:06:44

solution1
5 ACCPTED 2017-10-17 11:06:44