I'm trying to group the following dataframe by unique binId and then parse the resulting rows based of 'z' and pick the row with highest value of 'z'. Here is my dataframe.
import pandas as pd
df = pd.DataFrame({'ID':['1','2','3','4','5','6'], 'binId': ['1','2','2','1','1','3'], 'x':[1,4,5,6,3,4], 'y':[11,24,35,16,23,34],'z':[1,4,5,2,3,4]})
` I tried following code which gives required answer,
def f(x):
tp = df[df['binId'] == x][['binId','ID','x','y','z']].sort_values(by='z', ascending=False).iloc[0]
return tp`
and then,
binids= pd.Series(df.binId.unique())
print binids.apply(f)
The output is,
binId ID x y z
0 1 5 3 23 3
1 2 3 5 35 5
2 3 6 4 34 4
But the execution is too slow. What is the faster way of doing this?
Use idxmax
for indices of max
and select by loc
:
df1 = df.loc[df.groupby('binId')['z'].idxmax()]
Or faster is use sort_values
with drop_duplicates
:
df1 = df.sort_values(['binId', 'z']).drop_duplicates('binId', keep='last')
print (df1)
ID binId x y z
4 5 1 3 23 3
2 3 2 5 35 5
5 6 3 4 34 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.