简体   繁体   中英

creating panda series based on dataframe column sums

Given this dataframe:

df
       animal location  number
0       lions     zoo1      10
1      tigers     zoo2     200
2       bears     zoo2     300
3        ohmy     zoo2     400
4   antelopes     zoo1      20
5  orangutans     zoo2     500
6      zebras     zoo1      30

I need to find the top three animals by zoo, sum them independent of animal type, and then rank which zoo has the most animals. I've gotten to the point where I can select and group by location and animal, I think what I need to do now is create a series with the location as the key and the sum of each location as the value, but I'm stuck. (If that's not the best way to approach it, I'm open to suggestions.)

df.set_index(['location','number'], inplace=True)
df.groupby(level=[0,1]).sum().reset_index().sort_values(['location', 'number'], ascending=[1,0]).groupby('location').head(3)
  location  number      animal
2     zoo1      30      zebras
1     zoo1      20   antelopes
0     zoo1      10       lions
6     zoo2     500  orangutans
5     zoo2     400        ohmy
4     zoo2     300       bears

(The sum() doesn't seem to be doing anything in this context, but I have to have something for the groupby() and I don't know if there's anything better.) What I'm looking for from here is:

zoo2   1200
zoo1   60

Code to create the initial df:

df = pd.DataFrame({'animal': ['lions', 'tigers', 'bears', 'ohmy', 'antelopes', 'orangutans','zebras'],
    'location': ['zoo1', 'zoo2', 'zoo2', 'zoo2', 'zoo1', 'zoo2', 'zoo1'],
    'number': [10, 200, 300, 400, 20, 500, 30]})

This can be done using groupby :

df.groupby('location')['number'].sum()

will return this as output:

location
zoo1      60
zoo2    1200
Name: number, dtype: int64

using your initial data frame:

df
       animal location  number
0       lions     zoo1      10
1      tigers     zoo2     200
2       bears     zoo2     300
3        ohmy     zoo2     400
4   antelopes     zoo1      20
5  orangutans     zoo2     500
6      zebras     zoo1      30

Use groupby :

df.groupby('location')['number'].apply(lambda x: x.sort_values(ascending=False).head(3).sum())

Output:

location
zoo1      60
zoo2    1200
Name: number, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM