简体   繁体   中英

Pandas: Check if row has similar values

I'm generating an overlay for a map using pandas and used:

if ((df['latitude'] == new_latitude) & (df['longitude'] == new_longitude)).any():
   continue

to make sure that I don't produce duplicate points. But I am starting to produce points that are 0.001 different (in either longitude, latitude or both) than one already produced. How can I prevent this in a similar manner as above?

IIUC you can subtract from the entire series and then just filter the points:

thresh = 0.001
lat = df.loc[(df['latitude'] - new_latitude).abs() > thresh, 'latitude']
lon = df.loc[(df['longtitude'] - new_longtitude).abs() > thresh, 'longtitude']

this uses abs to get the absolute value to generate a boolean mask and filter all the duplicate and near duplicate values out.

You could use numpy.isclose function with atol setted to your precision:

import numpy as np
prec = 0.001
np.isclose(df['latitude'], new_latitude, atol=prec)

if ((np.isclose(df['latitude'], new_latitude, prec) & (np.isclose(df['longitude'], new_longitude, prec)).any():
   continue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM