简体   繁体   中英

How to get dataframe of unique ids

I'm trying to group the following dataframe by unique binId and then parse the resulting rows based of 'z' and pick the row with highest value of 'z'. Here is my dataframe.

import pandas as pd
df = pd.DataFrame({'ID':['1','2','3','4','5','6'], 'binId': ['1','2','2','1','1','3'], 'x':[1,4,5,6,3,4], 'y':[11,24,35,16,23,34],'z':[1,4,5,2,3,4]})

` I tried following code which gives required answer,

def f(x):
    tp = df[df['binId'] == x][['binId','ID','x','y','z']].sort_values(by='z', ascending=False).iloc[0]
    return tp`

and then,

binids= pd.Series(df.binId.unique())
print binids.apply(f)

The output is,

binId ID  x   y  z
0     1  5  3  23  3
1     2  3  5  35  5
2     3  6  4  34  4

But the execution is too slow. What is the faster way of doing this?

Use idxmax for indices of max and select by loc :

df1 = df.loc[df.groupby('binId')['z'].idxmax()]

Or faster is use sort_values with drop_duplicates :

df1 = df.sort_values(['binId', 'z']).drop_duplicates('binId', keep='last')

print (df1)
  ID binId  x   y  z
4  5     1  3  23  3
2  3     2  5  35  5
5  6     3  4  34  4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM