this is a sample my dataframe(seasonal_name):
season winner count team_api_id team_long_name
1 2008/2009 10260 28 10260.0 Manchester United
2 2008/2009 8634 27 8634.0 FC Barcelona
3 2008/2009 8548 26 8548.0 Rangers
4 2008/2009 8650 25 8650.0 Liverpool
5 2008/2009 8636 25 8636.0 Inter
seasonal_name.groupby("season")["count"].max()
and its output is like this:
season
2008/2009 28
2009/2010 31
2010/2011 30
2011/2012 32
2012/2013 32
I want the output to contain the corresponding team_long_name and the other columns as well not just the count column.
If you're okay with dropping duplicates (eg if 2 different teams have the same highest score for a season, only display 1 of the teams), you can do the following:
new_df = df.loc[df.groupby(["Season"])["Count"].idxmax()]
This first groups by Season
and gets the indices of the rows having the maximum count for each season. We then locate those rows in df
using loc
.
The only issue is that idxmax
only returns 1 row even if the maximum Count
value for a season exists in multiple rows. Still, I hope this is a good starting point for you.
This is the code in action. The following is the example dataframe I used:
Group Count Season
0 G 2 spring
1 G 3 summer
2 H 1 spring
3 H 4 summer
And this is what my line of code gives for new_df
:
Group Count Season
0 G 2 spring
3 H 4 summer
If you'd like to change the index of new_df
, you can just do:
new_df = new_df.reset_index(drop=True)
Hope that helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.