简体   繁体   中英

Getting max value and index with max value at the same time from a pandas dataframe

Suppose I have the following dataframe

  Country  Year  Count
0     USA  2021   1500
1     USA  2018   6000
2   India  2019   3000
3   India  2021   5000
4      UK  2019   4000
5     USA  2019   3200
6   India  2018   5000

I want to print the following

Entry with Max count is (USA, 2018, 6000)

Country with max total count is: (India, 13000)

Entry with max count in each year is:
2018, USA, 6000
2019, UK, 4000
2021, India, 5000

The code below works. But a couple of questions to see if I can do better

  1. Any way to get maximum index and maximum value at same time instead of getting maxidx and then getting the values in it?
  2. Any cleaner and simpler to get all the three quantities I want?
# Print (country, year, count) of the row with max count among all entries
max_idx = df['Count'].idxmax()
print("Entry with Max count is (" + \
      str(df.loc[max_idx]['Country']) + ", " \
      + str(df.loc[max_idx]['Year']) + ", " \
      + str(df.loc[max_idx]['Count']) + ")" )

# Print country with max total count and print (country, max total count)
country_sum = pd.pivot_table(df, index='Country', aggfunc=np.sum)
print("\nCountry with max total count is: ("\
      + country_sum['Count'].idxmax() + ", "\
      + str(country_sum['Count'].max())\
      + ")")


# Print country with max count in each year
year_country_groupby = df.groupby('Year')
print('\nEntry with max count in each year is:')
for key, gdf in year_country_groupby:
    max_idx = gdf['Count'].idxmax()
    print(str(key) + ", "\
          + str(gdf.loc[max_idx]['Country']) + ", "\
          + str(df.loc[max_idx]['Count']))

You can simplify your output like this:

# 1st output
cty, year, cnt = df.loc[df['Count'].idxmax()]
print(f"Entry with Max count is ({cty}, {year}, {cnt})")

# 2nd output
cty, cnt = df.groupby('Country')['Count'].sum().nlargest(1).reset_index().squeeze()
print(f"Country with max total count is: ({cty}, {cnt})")

# 3rd output
print("Entry with max count in each year is:")
for _, (cty, year, cnt) in df.loc[df.groupby('Year')['Count'].idxmax()].iterrows():
    print(f"{year}, {cty}, {cnt}")

Output:

Entry with Max count is (USA, 2018, 6000)

Country with max total count is: (India, 13000)

Entry with max count in each year is:
2018, USA, 6000
2019, UK, 4000
2021, India, 5000

Update To get both max index and value, you can use agg :

idxmax, valmax = df['Count'].agg(['idxmax', 'max'])
print(idxmax, valmax)

# Output:
1 6000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM