[英]Summarize values in panda data frames
I want to calculate the maximum value for each year and show the sector and that value. 我想计算每年的最大值,并显示该部门和该值。 For example, from the screenshot, I would like to display: 2010: Telecom 781 2011: Tech 973
例如,从屏幕截图中,我要显示:2010:电信781 2011:技术973
I have tried using: df.groupby(['Year', 'Sector'])['Revenue'].max() 我尝试使用:df.groupby(['Year','Sector'])['Revenue']。max()
but this does not give me the name of Sector which has the highest value. 但这并没有给我提供最高价值的部门名称。
Try using idxmax
and loc
: 尝试使用
idxmax
和loc
:
df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]
MVCE: MVCE:
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'Sector':['Telecom','Tech','Financial Service','Construction','Heath Care']*3,
'Year':[2010,2011,2012,2013,2014]*3,
'Revenue':np.random.randint(101,999,15)})
df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]
Output: 输出:
Sector Year Revenue
3 Construction 2013 423
12 Financial Service 2012 838
9 Heath Care 2014 224
1 Tech 2011 466
5 Telecom 2010 843
Also .sort_values
+ .tail
, grouping on just year. 还有
.sort_values
+ .tail
,仅按年份分组。 Data from @Scott Boston 来自@Scott Boston的数据
df.sort_values('Revenue').groupby('Year').tail(1)
Output: 输出:
Sector Year Revenue
9 Heath Care 2014 224
3 Construction 2013 423
1 Tech 2011 466
12 Financial Service 2012 838
5 Telecom 2010 843
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.