简体   繁体   中英

Removing duplicates from pandas data frame with condition based on another column

Assuming I have the following DataFrame:

Row | Temperature | Measurement
 A1 | 26.7        | 12
 A1 | 25.7        | 13
 A2 | 27.3        | 11
 A2 | 28.3        | 12
 A3 | 25.6        | 17
 A3 | 23.4        | 14
 ----------------------------
 P3 | 25.7        |14

I want to remove the duplicate rows with respect to column 'Row' , and I want to retain only the rows with value closest to 25 in column Temperature . For example:

Row | Temperature | Measurement
 A1 | 25.7        | 13
 A2 | 27.3        | 11
 A3 | 25.6        | 17
 ----------------------------
 P3 | 25.7        |14

I am trying to use this function to find the nearest within an array:

    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

array = df['Temperature']
value = 25

But I am not sure how to go about pandas.drop_duplicates in the df. Thank you!

python pandas dataframe

One way to do is create a temporary column and sort on that, then drop duplicates:

df['key'] = df['Temperature'].sub(25).abs()

# sort by key, drop duplicates, and resort
df.sort_values('key').drop_duplicates('Row').sort_index()

Output:

  Row  Temperature  Measurement  key
1  A1         25.7           13  0.7
2  A2         27.3           11  2.3
4  A3         25.6           17  0.6
6  P3         25.7           14  0.7

Another option, similar to what you are trying to do, is to use np.argsort on the key, and sort by iloc . This avoids creation of a new column in the data:

orders = np.argsort(df['Temperature'].sub(25).abs())

df.iloc[orders].drop_duplicates('Row').sort_index()

Output:

  Row  Temperature  Measurement
1  A1         25.7           13
2  A2         27.3           11
4  A3         25.6           17
6  P3         25.7           14

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM