简体   繁体   中英

Selecting rows with lowest values based on combination two columns from pandas

I'm not even sure if the title makes sense.

I have a pandas dataframe with 3 columns: x, y, time. There are a few thousand rows. Example below:

       x      y    time
0     225     0  20.295270
1     225     1  21.134015
2     225     2  21.382298
3     225     3  20.704367
4     225     4  20.152735
5     225     5  19.213522
.......
900   437   900  27.748966
901   437   901  20.898460
902   437   902  23.347935
903   437   903  22.011992
904   437   904  21.231041
905   437   905  28.769945
906   437   906  21.662975
.... and so on

What I want to do is retrieve those rows which have the smallest time associated with x and y. Basically for every element on the y, I want to find which have the smallest time value but I want to exclude those that have time 0.0 . This happens when x has the same value as y.

So for example, the fastest way to get to y-0 is by starting from x-225 and so on, therefore it could be the case that x repeats itself but for a different y.

e.g. 
x      y    time
225     0  20.295270
438     1  19.648954
27     20   4.342732
9     438  17.884423
225   907  24.560400

I tried up until now groupby but I'm only getting the same x as y.

print(df.groupby('id_y', sort=False)['time'].idxmin())

y
0        0
1        1
2        2
3        3
4        4

The one below just returns the df that I already have.

df.loc[df.groupby("id_y")["time"].idxmin()]

Just to point out one thing, I'm open to options, not just groupby, if there are other ways that is very good.

So need remove rows with time equal first by boolean indexing and then use your solution:

df = df[df['time'] != 0]
df2 = df.loc[df.groupby("y")["time"].idxmin()]

Similar alternative with filter by query :

df = df.query('time != 0')
df2 = df.loc[df.groupby("y")["time"].idxmin()]

Or use sort_values with drop_duplicates :

df2 = df[df['time'] != 0].sort_values(['y','time']).drop_duplicates('y')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM