creating panda series based on dataframe column sums

Question

Given this dataframe:

df
       animal location  number
0       lions     zoo1      10
1      tigers     zoo2     200
2       bears     zoo2     300
3        ohmy     zoo2     400
4   antelopes     zoo1      20
5  orangutans     zoo2     500
6      zebras     zoo1      30

I need to find the top three animals by zoo, sum them independent of animal type, and then rank which zoo has the most animals. I've gotten to the point where I can select and group by location and animal, I think what I need to do now is create a series with the location as the key and the sum of each location as the value, but I'm stuck. (If that's not the best way to approach it, I'm open to suggestions.)

df.set_index(['location','number'], inplace=True)
df.groupby(level=[0,1]).sum().reset_index().sort_values(['location', 'number'], ascending=[1,0]).groupby('location').head(3)
  location  number      animal
2     zoo1      30      zebras
1     zoo1      20   antelopes
0     zoo1      10       lions
6     zoo2     500  orangutans
5     zoo2     400        ohmy
4     zoo2     300       bears

(The sum() doesn't seem to be doing anything in this context, but I have to have something for the groupby() and I don't know if there's anything better.) What I'm looking for from here is:

zoo2   1200
zoo1   60

Code to create the initial df:

df = pd.DataFrame({'animal': ['lions', 'tigers', 'bears', 'ohmy', 'antelopes', 'orangutans','zebras'],
    'location': ['zoo1', 'zoo2', 'zoo2', 'zoo2', 'zoo1', 'zoo2', 'zoo1'],
    'number': [10, 200, 300, 400, 20, 500, 30]})

Answer 1

This can be done using groupby :

df.groupby('location')['number'].sum()

will return this as output:

location
zoo1      60
zoo2    1200
Name: number, dtype: int64

Answer 2

using your initial data frame:

df
       animal location  number
0       lions     zoo1      10
1      tigers     zoo2     200
2       bears     zoo2     300
3        ohmy     zoo2     400
4   antelopes     zoo1      20
5  orangutans     zoo2     500
6      zebras     zoo1      30

Use groupby :

df.groupby('location')['number'].apply(lambda x: x.sort_values(ascending=False).head(3).sum())

Output:

location
zoo1      60
zoo2    1200
Name: number, dtype: int64

creating panda series based on dataframe column sums

Question

2 answers

solution1
1 2019-10-10 18:34:06

solution2
0 ACCPTED 2019-10-10 18:31:40

creating panda series based on dataframe column sums

Question

2 answers

solution1 1 2019-10-10 18:34:06

solution2 0 ACCPTED 2019-10-10 18:31:40

solution1
1 2019-10-10 18:34:06

solution2
0 ACCPTED 2019-10-10 18:31:40