选择每个组的最大值

Question

So I have a pandas data frame with multiple columns and a id column. 所以我有一个带有多个列和一个id列的pandas数据框。

df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD'))
df['id'] = ['CA', 'CA', 'CA', 'FL', 'FL', 'FL']
df['technique'] = ['one', 'two', 'three', 'one', 'two', 'three']
df

I want to group by the id column and select the row which had the highest probability. 我想按id列分组，然后选择概率最高的行。 So it could look like this. 所以看起来可能像这样。

id   highest_prob   technique
CA   B               three 
FL   C               one

I tried something like this, but that would only get me half of the way. 我尝试过类似的方法，但这只会使我半途而废。

df.groupby('id', as_index=False)[['A','B','C','D']].max()

Anyone have suggestions on how I can get the desired result 任何人都对我如何获得期望的结果有建议

Answer 1

Setup 设定

np.random.seed(0)  # Add seed to reproduce results. 
df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD'))
df['id'] = ['CA', 'CA', 'CA', 'FL', 'FL', 'FL']
df['technique'] = ['one', 'two', 'three', 'one', 'two', 'three']

You could melt , sort with sort_values , and drop duplicates using drop_duplicates : 您可以melt ，使用sort_values排序，并使用drop_duplicates删除重复drop_duplicates ：

(df.melt(['id', 'technique'])
   .sort_values(['id', 'value'], ascending=[True, False])
   .drop_duplicates('id')
   .drop('value', 1)
   .reset_index(drop=True)
   .rename({'variable': 'highest_prob'}, axis=1))

   id technique highest_prob
0  CA       one            D
1  FL       two            A

Another solution is to use melt and groupby : 另一种解决方案是使用melt和groupby ：

v = df.melt(['id', 'technique'])
(v.iloc[v.groupby('id').value.idxmax()]
  .drop('value', 1)
  .reset_index(drop=True)
  .rename({'variable': 'highest_prob'}, axis=1))

   id technique highest_prob
0  CA       one            D
1  FL       two            A

选择每个组的最大值

问题描述

1 个解决方案

解决方案1
2 2018-10-03 04:15:34

选择每个组的最大值

问题描述

1 个解决方案

解决方案1 2 2018-10-03 04:15:34

解决方案1
2 2018-10-03 04:15:34