[英]pandas find max value in groupby and apply function
I've got a dataframe df like the following: 我有一个如下所示的数据框df:
H,Nu,City
1,15,Madrid
3,15,Madrid
3,1600,Madrid
5,17615,Madrid
2,55,Dublin
4,5706,Dublin
2,68,Dublin
1,68,Dublin
I would like to find the max value / city of the Nu column. 我想找到Nu列的最大值/城市。 Then find the corresponding values of
H
and add a new column df['H2'] = df['H']/max(H/city)
. 然后找到
H
的对应值,并添加新列df['H2'] = df['H']/max(H/city)
。 So far I tried: 到目前为止,我尝试了:
d = df.groupby('City').apply(lambda t: t[t.Nu==t.Nu.max()])
which correctly returns: 正确返回:
H Nu City
City
Dublin 5 4 5706 Dublin
Madrid 3 5 17615 Madrid
How may I set my maximum H value (4 for Dublin and 5 for Madrid) as a constant / city in order to apply the function all over the DataFrame? 如何将最大H值(都柏林为4,马德里为5)设置为常数/城市,以便在整个DataFrame上应用该函数? The expected df would appear as:
预期的df将显示为:
H,Nu,City,H2
1,15,Madrid,0.2
3,15,Madrid,0.6
3,1600,Madrid,0.6
5,17615,Madrid,1.0
2,55,Dublin,0.5
4,5706,Dublin,1.0
2,68,Dublin,0.5
1,68,Dublin,0.25
using .idxmax
, you may obtain which row has the highest Nu
value for each City
: 使用
.idxmax
,您可以获得每个City
Nu
值最高的行:
>>> i = df.groupby('City')['Nu'].transform('idxmax').values
>>> df['H2'] = df['H'] / df.loc[i, 'H'].values
>>> df
H Nu City H2
0 1 15 Madrid 0.20
1 3 15 Madrid 0.60
2 3 1600 Madrid 0.60
3 5 17615 Madrid 1.00
4 2 55 Dublin 0.50
5 4 5706 Dublin 1.00
6 2 68 Dublin 0.50
7 1 68 Dublin 0.25
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.