I have a dataframe named origA
:
X, Y
10, 20
11, 2
9, 35
8, 7
And another one named calcB
:
Xc, Yc
1, 7
9, 22
I want to check that for every Xc, Yc
pair in calcB
if there is a X,Y
pair in origA
that has an euclidean distance to Xc, Yc
that is less than delta
and if yes, put True
in the respective row at a new column Detected
in origA
.
You can using the method from scipy
import scipy
delta=5
ary = scipy.spatial.distance.cdist(dfa, dfb, metric='euclidean')
ary
Out[189]:
array([[15.8113883 , 2.23606798],
[11.18033989, 20.09975124],
[29.12043956, 13. ],
[ 7. , 15.03329638]])
dfa['detected']=(ary<delta).any(1)
dfa
Out[191]:
X Y detected
0 10 20 False
1 11 2 True
2 9 35 True
3 8 7 False
@Wen-Ben's solution might work for small datasets. However, you run quickly into performance problems when you try to compute the distances for many points. Hence, there are already plenty of smart algorithms which reduce the amount of required distance calculations - one of them is BallTree (provided by scikit-learn):
from sklearn.neighbors import BallTree
# Prepare the data and the search radius:
origA = pd.DataFrame()
origA['X'] = [10, 11, 9, 8]
origA['Y'] = [20, 2, 35, 7]
calcB = pd.DataFrame()
calcB['Xc'] = [1, 9]
calcB['Yc'] = [7, 22]
delta = 5
# Stack the coordinates together:
pointsA = np.column_stack([origA.X, origA.Y])
pointsB = np.column_stack([calcB.Xc, calcB.Yc])
# Create the Ball Tree and search for close points:
tree = BallTree(pointsB)
detected = tree.query_radius(pointsA, r=delta, count_only=True)
# Add results as additional column:
origA['Detected'] = detected.astype(bool)
Output
X Y Detected
0 10 20 True
1 11 2 False
2 9 35 False
3 8 7 False
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.