简体   繁体   English

熊猫:在最大距离内找到点

[英]Pandas: finding points within maximum distance

I am trying to find pairs of (x,y) points within a maximum distance of each other. 我试图在彼此的最大距离内找到成对的(x,y)点。 I thought the simplest thing to do would be to generate a DataFrame and go through each point, one by one, calculating if there are points with coordinates (x,y) within distance r of the given point (x_0, y_0). 我认为最简单的方法是生成一个DataFrame并逐个遍历每个点,计算在给定点(x_0,y_0)的距离r内是否存在坐标(x,y)的点。 Then, divide the total number of discovered pairs by 2. 然后,将发现的对的总数除以2。

%pylab inline
import pandas as pd

def find_nbrs(low, high, num, max_d):
    x = random.uniform(low, high, num)
    y = random.uniform(low, high, num)
    points = pd.DataFrame({'x':x, 'y':y})

    tot_nbrs = 0

    for i in arange(len(points)):
        x_0 = points.x[i]
        y_0 = points.y[i]

        pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
        tot_nbrs += len(pt_nbrz)
        plot (pt_nbrz.x, pt_nbrz.y, 'r-')

    plot (points.x, points.y, 'b.')
    return tot_nbrs

print find_nbrs(0, 1, 50, 0.1)
  1. First of all, it's not always finding the right pairs (I see points that are within the stated distance that are not labeled). 首先,它并不总能找到合适的对(我看到在指定距离内没有标记的点)。

  2. If I write plot(..., 'or') , it highlights all the points. 如果我写plot(..., 'or') ,它会突出显示所有点。 Which means that pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2] returns at least one (x,y). 这意味着pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]返回至少一个(x,y)。 Why? 为什么? Shouldn't it return an empty array if the comparison is False? 如果比较为False,它不应该返回一个空数组吗?

  3. How do I do all of the above more elegantly in Pandas? 如何在熊猫中更优雅地完成上述所有操作? For example, without having to loop through each element. 例如,无需遍历每个元素。

The functionality you're looking for is included in scipy's spatial distance module . 您正在寻找的功能包含在scipy的空间距离模块中

Here's an example of how you could use it. 这是一个如何使用它的例子。 The real magic is in squareform(pdist(points)) . 真正的魔力在于squareform(pdist(points))

from scipy.spatial.distance import pdist, squareform
import numpy as np
import matplotlib.pyplot as plt

points = np.random.uniform(-.5, .5, (1000,2))

# Compute the distance between each different pair of points in X with pdist.
# Then, just for ease of working, convert to a typical symmetric distance matrix
# with squareform.
dists = squareform(pdist(points))

poi = points[4] # point of interest
dist_min = .1
close_points = dists[4] < dist_min

print("There are {} other points within a distance of {} from the point "
    "({:.3f}, {:.3f})".format(close_points.sum() - 1, dist_min, *poi))

There are 27 other points within a distance of 0.1 from the point (0.194, 0.160)

For visualization purposes: 出于可视化目的:

f,ax = plt.subplots(subplot_kw=
    dict(aspect='equal', xlim=(-.5, .5), ylim=(-.5, .5)))
ax.plot(points[:,0], points[:,1], 'b+ ')
ax.plot(poi[0], poi[1], ms=15, marker='s', mfc='none', mec='g')
ax.plot(points[close_points,0], points[close_points,1],
    marker='o', mfc='none', mec='r', ls='')  # draw all points within distance

t = np.linspace(0, 2*np.pi, 512)
circle = dist_min*np.vstack([np.cos(t), np.sin(t)]).T
ax.plot((circle+poi)[:,0], (circle+poi)[:,1], 'k:') # Add a visual check for that distance
plt.show()

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM