简体   繁体   English

检查表 B 中的 X、Y 列对是否在表 A 中任何 X、Y 列对的增量距离内

[英]Check if X,Y column pair in table B is within delta distance of any X, Y column pair in table A

I have a dataframe named origA :我有一个名为origA的数据origA

X, Y
10, 20
11, 2
9, 35
8, 7

And another one named calcB :另一个名为calcB

Xc, Yc
1, 7
9, 22

I want to check that for every Xc, Yc pair in calcB if there is a X,Y pair in origA that has an euclidean distance to Xc, Yc that is less than delta and if yes, put True in the respective row at a new column Detected in origA .我要检查,每Xc, Yc对在calcB如果有X,Y对在origA有一个欧氏距离Xc, Yc小于delta ,如果是,把True的相应行中的一个新的在origA DetectedorigA

You can using the method from scipy您可以使用scipy的方法

import scipy
delta=5
ary = scipy.spatial.distance.cdist(dfa, dfb, metric='euclidean')
ary
Out[189]: 
array([[15.8113883 ,  2.23606798],
       [11.18033989, 20.09975124],
       [29.12043956, 13.        ],
       [ 7.        , 15.03329638]])
dfa['detected']=(ary<delta).any(1)
dfa
Out[191]: 
    X   Y  detected
0  10  20      False
1  11   2      True
2   9  35      True
3   8   7      False

@Wen-Ben's solution might work for small datasets. @Wen-Ben 的解决方案可能适用于小型数据集。 However, you run quickly into performance problems when you try to compute the distances for many points.但是,当您尝试计算许多点的距离时,您很快就会遇到性能问题。 Hence, there are already plenty of smart algorithms which reduce the amount of required distance calculations - one of them is BallTree (provided by scikit-learn):因此,已经有很多智能算法可以减少所需的距离计算量——其中之一是 BallTree(由 scikit-learn 提供):

from sklearn.neighbors import BallTree

# Prepare the data and the search radius:
origA = pd.DataFrame()
origA['X'] = [10, 11, 9, 8]
origA['Y'] = [20, 2, 35, 7]

calcB = pd.DataFrame()
calcB['Xc'] = [1, 9]
calcB['Yc'] = [7, 22]

delta = 5

# Stack the coordinates together:
pointsA = np.column_stack([origA.X, origA.Y])
pointsB = np.column_stack([calcB.Xc, calcB.Yc])

# Create the Ball Tree and search for close points:
tree = BallTree(pointsB)
detected = tree.query_radius(pointsA, r=delta, count_only=True)

# Add results as additional column:
origA['Detected'] = detected.astype(bool)

Output输出

    X   Y   Detected
0   10  20  True
1   11  2   False
2   9   35  False
3   8   7   False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM