Assuming I have the following DataFrame:
Row | Temperature | Measurement
A1 | 26.7 | 12
A1 | 25.7 | 13
A2 | 27.3 | 11
A2 | 28.3 | 12
A3 | 25.6 | 17
A3 | 23.4 | 14
----------------------------
P3 | 25.7 |14
I want to remove the duplicate rows with respect to column 'Row' , and I want to retain only the rows with value closest to 25 in column Temperature . For example:
Row | Temperature | Measurement
A1 | 25.7 | 13
A2 | 27.3 | 11
A3 | 25.6 | 17
----------------------------
P3 | 25.7 |14
I am trying to use this function to find the nearest within an array:
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
array = df['Temperature']
value = 25
But I am not sure how to go about pandas.drop_duplicates
in the df. Thank you!
python
pandas
dataframe
One way to do is create a temporary column and sort on that, then drop duplicates:
df['key'] = df['Temperature'].sub(25).abs()
# sort by key, drop duplicates, and resort
df.sort_values('key').drop_duplicates('Row').sort_index()
Output:
Row Temperature Measurement key
1 A1 25.7 13 0.7
2 A2 27.3 11 2.3
4 A3 25.6 17 0.6
6 P3 25.7 14 0.7
Another option, similar to what you are trying to do, is to use np.argsort
on the key, and sort by iloc
. This avoids creation of a new column in the data:
orders = np.argsort(df['Temperature'].sub(25).abs())
df.iloc[orders].drop_duplicates('Row').sort_index()
Output:
Row Temperature Measurement
1 A1 25.7 13
2 A2 27.3 11
4 A3 25.6 17
6 P3 25.7 14
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.