[英]Pandas Groupby - select row with highest value in one column if multiple rows exceed value in another
This operation groups my DataFrame by two columns, then returns the row with the highest value in ColumnC
: 此操作将DataFrame按两列进行分组,然后返回
ColumnC
具有最高值的ColumnC
:
df2 = df.loc[df.groupby(['columnA', 'columnB'], sort=False)['columnC'].idxmax()]
Instead, for all rows where ColumnC > 100
within each group, I would like to take the row with the highest value in ColumnD
. 相反,对于每个组中
ColumnC > 100
所有行,我想采用ColumnD
具有最高值的ColumnD
。
How can I do this? 我怎样才能做到这一点?
Edit: 编辑:
Comment below by @Code Different is basically what I'm looking for, but I don't want to exclude groups where none of the rows have ColumnC > 100
, in these cases I want the row with the highest value in ColumnC
, as in the example above. @Code Different在下面的注释基本上是我要查找的内容,但是我不想排除没有任何行的
ColumnC > 100
,在这种情况下,我希望在ColumnC
具有最高值的ColumnC
,如上面的例子。
Usually we split the data by two part , then filter them after the condition 通常,我们将数据分为两部分,然后在条件满足后进行过滤
df=sort_values('columnD')
df1 = df[df['columnC'] > 100]].drop_duplicates(['columnA', 'columnB'],keep='last')
df2 = df.drop_duplicates(['columnA', 'columnB'],keep='last')
Yourdf=pd.concat([df1,df2]).drop_duplicates(['columnA', 'columnB'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.