[英]Pandas groupby -- get output value based on max value of another column
I have the following dataframe:我有以下 dataframe:
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
'Parrot', 'Parrot'],
'Habitat':['Jungle', 'Jungle',
'Sky', 'Sky'],
'Tmp':['A', 'B', 'C', 'D'],
'Max Speed': [380., 370., 24., 26.]})
>>> df
Animal Habitat Tmp Max Speed
0 Falcon Jungle A 380.0
1 Falcon Jungle B 370.0
2 Parrot Sky C 24.0
3 Parrot Sky D 26.0
I am trying to add additional column "Output" which will take the value from "Tmp" based on maximum value of column "Max Speed" in a groupby done of columns "Animal" and "Habitat".我正在尝试添加额外的列“输出”,它将根据“动物”和“栖息地”列的分组中“最大速度”列的最大值从“Tmp”中获取值。
Desired output:所需的 output:
Animal Habitat Tmp Max Speed Output
0 Falcon Jungle A 380.0 A
1 Falcon Jungle B 370.0 A
2 Parrot Sky C 24.0 D
3 Parrot Sky D 26.0 D
It can be done using a groupby
and then joining it in the original dataset.可以使用
groupby
,然后将其加入原始数据集中。 But is there a more efficient way to do this?但是有没有更有效的方法来做到这一点? Maybe using
transform
or something else?也许使用
transform
或其他东西?
You can define a function taking pd.dataframe
as argument:您可以定义一个 function 以
pd.dataframe
作为参数:
import pandas as pd
import numpy as np
def fmax(df_):
df_['Output'] = df_.sort_values(['Max Speed']).tail(1)['Tmp'].squeeze()
return df_
Please note use of pandas.DataFrame.squeeze
function to return scalar value.请注意使用
pandas.DataFrame.squeeze
function 返回标量值。 Then simply apply
above function using groupby
:然后简单地使用
groupby
在 function 之上apply
:
df.groupby(['Animal','Habitat']).apply(fmax)
The result is:结果是:
Animal Habitat Tmp Max Speed Output
0 Falcon Jungle A 380.0 A
1 Falcon Jungle B 370.0 A
2 Parrot Sky C 24.0 D
3 Parrot Sky D 26.0 D
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.