Removing duplicates from pandas data frame with condition based on another column

Question

Assuming I have the following DataFrame:

Row | Temperature | Measurement
 A1 | 26.7        | 12
 A1 | 25.7        | 13
 A2 | 27.3        | 11
 A2 | 28.3        | 12
 A3 | 25.6        | 17
 A3 | 23.4        | 14
 ----------------------------
 P3 | 25.7        |14

I want to remove the duplicate rows with respect to column 'Row' , and I want to retain only the rows with value closest to 25 in column Temperature . For example:

Row | Temperature | Measurement
 A1 | 25.7        | 13
 A2 | 27.3        | 11
 A3 | 25.6        | 17
 ----------------------------
 P3 | 25.7        |14

I am trying to use this function to find the nearest within an array:

    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

array = df['Temperature']
value = 25

But I am not sure how to go about pandas.drop_duplicates in the df. Thank you!

python pandas dataframe

Answer 1

One way to do is create a temporary column and sort on that, then drop duplicates:

df['key'] = df['Temperature'].sub(25).abs()

# sort by key, drop duplicates, and resort
df.sort_values('key').drop_duplicates('Row').sort_index()

Output:

  Row  Temperature  Measurement  key
1  A1         25.7           13  0.7
2  A2         27.3           11  2.3
4  A3         25.6           17  0.6
6  P3         25.7           14  0.7

Another option, similar to what you are trying to do, is to use np.argsort on the key, and sort by iloc . This avoids creation of a new column in the data:

orders = np.argsort(df['Temperature'].sub(25).abs())

df.iloc[orders].drop_duplicates('Row').sort_index()

Output:

  Row  Temperature  Measurement
1  A1         25.7           13
2  A2         27.3           11
4  A3         25.6           17
6  P3         25.7           14

Removing duplicates from pandas data frame with condition based on another column

Question

1 answers

solution1
0 ACCPTED 2021-02-11 03:36:01

Removing duplicates from pandas data frame with condition based on another column

Question

1 answers

solution1 0 ACCPTED 2021-02-11 03:36:01

solution1
0 ACCPTED 2021-02-11 03:36:01