How to get dataframe of unique ids

Question

I'm trying to group the following dataframe by unique binId and then parse the resulting rows based of 'z' and pick the row with highest value of 'z'. Here is my dataframe.

import pandas as pd
df = pd.DataFrame({'ID':['1','2','3','4','5','6'], 'binId': ['1','2','2','1','1','3'], 'x':[1,4,5,6,3,4], 'y':[11,24,35,16,23,34],'z':[1,4,5,2,3,4]})

` I tried following code which gives required answer,

def f(x):
    tp = df[df['binId'] == x][['binId','ID','x','y','z']].sort_values(by='z', ascending=False).iloc[0]
    return tp`

and then,

binids= pd.Series(df.binId.unique())
print binids.apply(f)

The output is,

binId ID  x   y  z
0     1  5  3  23  3
1     2  3  5  35  5
2     3  6  4  34  4

But the execution is too slow. What is the faster way of doing this?

Answer 1

Use idxmax for indices of max and select by loc :

df1 = df.loc[df.groupby('binId')['z'].idxmax()]

Or faster is use sort_values with drop_duplicates :

df1 = df.sort_values(['binId', 'z']).drop_duplicates('binId', keep='last')

print (df1)
  ID binId  x   y  z
4  5     1  3  23  3
2  3     2  5  35  5
5  6     3  4  34  4

How to get dataframe of unique ids

Question

1 answers

solution1
1 ACCPTED 2018-03-09 12:46:08

How to get dataframe of unique ids

Question

1 answers

solution1 1 ACCPTED 2018-03-09 12:46:08

solution1
1 ACCPTED 2018-03-09 12:46:08