简体   繁体   中英

group by multiple columns and get a sum and count

I'm trying to create a single data frame where can be visualized the 5 frequent ban characters by year, season and league. My initial df looks like this:

    League  Year    Season  ban_1   ban_2   ban_3   ban_4   ban_5
0   NALCS   2015    Spring  Rumble  Kassadin Lissandra NaN NaN
1   NALCS   2015    Spring  Tristana Leblanc Nidalee NaN NaN
2   NALCS   2015    Spring  Kassadin Sivir  Lissandra NaN NaN
3   NALCS   2015    Spring  RekSai  Janna   Leblanc NaN NaN
4   NALCS   2015    Spring  JarvanIV Lissandra Kassadin NaN NaN

and I want it to look something like this at the end:

Year    Season  League  Top 5 bans
2015    Spring  EULCS   [(Zed, 49), (Kassadin, 39), (Cassiopeia, 34), (RekSai, 33), (Nidalee, 30)]

At this point I've been trying to make it any sense so I tried this:

bans_info.groupby(['Year','Season', 'League', 'ban_1', 'ban_2', 'ban_3', 'ban_4', 'ban_5',]).sum()

and this:

bans_info.groupby(['Year', 'Season', 'League']).ban_1.value_counts() but still don't get it at the end I tried to make it separately but it becomes too messy

b1 = bans_info.groupby(['Year', 'Season', 'League']).ban_1.value_counts()
b2 = bans_info.groupby(['Year', 'Season', 'League']).ban_2.value_counts()
b12 = pd.merge(b1, b2, how='outer', on ='Year')

You need to use .agg and then pass in a dictionary of column names & functions.

You can find more detail here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM