简体   繁体   English

汇总熊猫数据框中的值

[英]Summarize values in panda data frames

I want to calculate the maximum value for each year and show the sector and that value. 我想计算每年的最大值,并显示该部门和该值。 For example, from the screenshot, I would like to display: 2010: Telecom 781 2011: Tech 973 例如,从屏幕截图中,我要显示:2010:电信781 2011:技术973

I have tried using: df.groupby(['Year', 'Sector'])['Revenue'].max() 我尝试使用:df.groupby(['Year','Sector'])['Revenue']。max()

but this does not give me the name of Sector which has the highest value. 但这并没有给我提供最高价值的部门名称。 在此处输入图片说明

Try using idxmax and loc : 尝试使用idxmaxloc

df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]

MVCE: MVCE:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'Sector':['Telecom','Tech','Financial Service','Construction','Heath Care']*3,
                   'Year':[2010,2011,2012,2013,2014]*3,
                   'Revenue':np.random.randint(101,999,15)})

df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]

Output: 输出:

               Sector  Year  Revenue
3        Construction  2013      423
12  Financial Service  2012      838
9          Heath Care  2014      224
1                Tech  2011      466
5             Telecom  2010      843

Also .sort_values + .tail , grouping on just year. 还有.sort_values + .tail ,仅按年份分组。 Data from @Scott Boston 来自@Scott Boston的数据

df.sort_values('Revenue').groupby('Year').tail(1)

Output: 输出:

               Sector  Year  Revenue
9          Heath Care  2014      224
3        Construction  2013      423
1                Tech  2011      466
12  Financial Service  2012      838
5             Telecom  2010      843

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM