[英]Select the max value for each group
So I have a pandas data frame with multiple columns and a id column. 所以我有一个带有多个列和一个id列的pandas数据框。
df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD'))
df['id'] = ['CA', 'CA', 'CA', 'FL', 'FL', 'FL']
df['technique'] = ['one', 'two', 'three', 'one', 'two', 'three']
df
I want to group by the id column and select the row which had the highest probability. 我想按id列分组,然后选择概率最高的行。 So it could look like this. 所以看起来可能像这样。
id highest_prob technique
CA B three
FL C one
I tried something like this, but that would only get me half of the way. 我尝试过类似的方法,但这只会使我半途而废。
df.groupby('id', as_index=False)[['A','B','C','D']].max()
Anyone have suggestions on how I can get the desired result 任何人都对我如何获得期望的结果有建议
Setup 设定
np.random.seed(0) # Add seed to reproduce results.
df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD'))
df['id'] = ['CA', 'CA', 'CA', 'FL', 'FL', 'FL']
df['technique'] = ['one', 'two', 'three', 'one', 'two', 'three']
You could melt
, sort with sort_values
, and drop duplicates using drop_duplicates
: 您可以melt
,使用sort_values
排序,并使用drop_duplicates
删除重复drop_duplicates
:
(df.melt(['id', 'technique'])
.sort_values(['id', 'value'], ascending=[True, False])
.drop_duplicates('id')
.drop('value', 1)
.reset_index(drop=True)
.rename({'variable': 'highest_prob'}, axis=1))
id technique highest_prob
0 CA one D
1 FL two A
Another solution is to use melt
and groupby
: 另一种解决方案是使用melt
和groupby
:
v = df.melt(['id', 'technique'])
(v.iloc[v.groupby('id').value.idxmax()]
.drop('value', 1)
.reset_index(drop=True)
.rename({'variable': 'highest_prob'}, axis=1))
id technique highest_prob
0 CA one D
1 FL two A
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.