简体   繁体   English

从 pandas dataframe 同时获取最大值和具有最大值的索引

[英]Getting max value and index with max value at the same time from a pandas dataframe

Suppose I have the following dataframe假设我有以下 dataframe

  Country  Year  Count
0     USA  2021   1500
1     USA  2018   6000
2   India  2019   3000
3   India  2021   5000
4      UK  2019   4000
5     USA  2019   3200
6   India  2018   5000

I want to print the following我想打印以下内容

Entry with Max count is (USA, 2018, 6000)

Country with max total count is: (India, 13000)

Entry with max count in each year is:
2018, USA, 6000
2019, UK, 4000
2021, India, 5000

The code below works.下面的代码有效。 But a couple of questions to see if I can do better但是有几个问题,看看我是否可以做得更好

  1. Any way to get maximum index and maximum value at same time instead of getting maxidx and then getting the values in it?有什么方法可以同时获取最大索引和最大值,而不是获取maxidx然后获取其中的值?
  2. Any cleaner and simpler to get all the three quantities I want?获得我想要的所有三个数量的任何更清洁和更简单的方法?
# Print (country, year, count) of the row with max count among all entries
max_idx = df['Count'].idxmax()
print("Entry with Max count is (" + \
      str(df.loc[max_idx]['Country']) + ", " \
      + str(df.loc[max_idx]['Year']) + ", " \
      + str(df.loc[max_idx]['Count']) + ")" )

# Print country with max total count and print (country, max total count)
country_sum = pd.pivot_table(df, index='Country', aggfunc=np.sum)
print("\nCountry with max total count is: ("\
      + country_sum['Count'].idxmax() + ", "\
      + str(country_sum['Count'].max())\
      + ")")


# Print country with max count in each year
year_country_groupby = df.groupby('Year')
print('\nEntry with max count in each year is:')
for key, gdf in year_country_groupby:
    max_idx = gdf['Count'].idxmax()
    print(str(key) + ", "\
          + str(gdf.loc[max_idx]['Country']) + ", "\
          + str(df.loc[max_idx]['Count']))

You can simplify your output like this:您可以像这样简化您的 output:

# 1st output
cty, year, cnt = df.loc[df['Count'].idxmax()]
print(f"Entry with Max count is ({cty}, {year}, {cnt})")

# 2nd output
cty, cnt = df.groupby('Country')['Count'].sum().nlargest(1).reset_index().squeeze()
print(f"Country with max total count is: ({cty}, {cnt})")

# 3rd output
print("Entry with max count in each year is:")
for _, (cty, year, cnt) in df.loc[df.groupby('Year')['Count'].idxmax()].iterrows():
    print(f"{year}, {cty}, {cnt}")

Output: Output:

Entry with Max count is (USA, 2018, 6000)

Country with max total count is: (India, 13000)

Entry with max count in each year is:
2018, USA, 6000
2019, UK, 4000
2021, India, 5000

Update To get both max index and value, you can use agg :更新要同时获取最大索引和值,您可以使用agg

idxmax, valmax = df['Count'].agg(['idxmax', 'max'])
print(idxmax, valmax)

# Output:
1 6000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM