根据最近距离找到最佳唯一邻居对

Question

General problem一般问题

First let's explain the problem more generally.首先让我们更一般地解释这个问题。 I have a collection of points with x,y coordinates and want to find the optimal unique neighbour pairs such that the distance between the neighbours in all pairs is minimised, but points cannot be used in more than one pair.我有一组具有 x,y 坐标的点，并希望找到最佳的唯一邻居对，以使所有对中的邻居之间的距离最小化，但点不能用于一对以上。

Some simple examples一些简单的例子

Note: points are not ordered and x and y coordinates will both vary between 0 and 1000, but for simplicity in below examples x==y and items are ordered.注意：点不是有序的，x 和 y 坐标都将在 0 到 1000 之间变化，但为了简单起见，在下面的示例中 x==y 并且项目是有序的。

First, let's say I have the following matrix of points:首先，假设我有以下点矩阵：

matrix1 = np.array([[1, 1],[2, 2],[5, 5],[6, 6]])

For this dataset, the output should be [0,0,1,1] as points 1 and 2 are closest to each other and points 3 and 4, providing pairs 0 and 2.对于此数据集，output 应为[0,0,1,1] ，因为点 1 和 2 彼此最接近，点 3 和 4 提供对 0 和 2。

Second, two points cannot have the same partner.其次，两点不能有相同的伙伴。 If we have the matrix:如果我们有矩阵：

matrix2 = np.array([[1, 1],[2, 2],[4, 4],[6, 6]])

Here pt1 and pt3 are closest to pt2, but pt1 is relatively closer, so the output should again be [0,0,1,1] .这里 pt1 和 pt3 最接近 pt2，但 pt1 相对更近，所以 output 应该再次为[0,0,1,1] 。

Third, if we have the matrix:第三，如果我们有矩阵：

matrix3 = np.array([[1, 1],[2, 2],[3, 3],[4, 4]])

Now pt1 and pt3 are again closest to pt2 but now they are at the same distance.现在 pt1 和 pt3 再次最接近 pt2 但现在它们的距离相同。 Now the output should again be [0,0,1,1] as pt4 is closest to pt3.现在 output 应该再次为[0,0,1,1] ，因为 pt4 最接近 pt3。

Fourth, in the case of an uneven number of points, the furthest point should be made nan, eg四、点数奇数的情况下，最远的点应设为nan，例如

matrix4 = np.array([[1, 1],[2, 2],[4,4]])

should give output [0,0,nan]应该给 output [0,0,nan]

Fifth, in the case there are three or more points with exactly the same distance, the pairing can be random, eg第五，在三个或更多点的距离完全相同的情况下，配对可以是随机的，例如

matrix5 = np.array([[1, 1],[2, 2],[3, 3]])

both an output of '[0,0,nan] and [nan,0,0]` should be fine. '[0,0,nan] and [nan,0,0]' 的 output 都应该没问题。

My effort我的努力

Using sklearn:使用 sklearn：

import numpy as np
from sklearn.neighbors import NearestNeighbors
data = matrix3
nbrs = NearestNeighbors(n_neighbors=len(data), algorithm="ball_tree")
nbrs = nbrs.fit(data)
distances,indices = nbrs.kneighbors(data)

This outputs instances:这输出实例：

array([[0, 1, 2, 3],
       [1, 2, 0, 3],
       [2, 1, 3, 0],
       [3, 2, 1, 0]]))

The second column provides the nearest points:第二列提供最近的点：

nearinds = `indices[:,1]`

Next in case there are duplicates in the list we need to find the nearest distance:接下来，如果列表中有重复项，我们需要找到最近的距离：

if len(set(nearinds) != len(nearinds):
    dupvals = [i for i in set(nearinds) if list(nearinds).count(i) > 1]
    for dupval in dupvals:
        dupinds = [i for i,j in enumerate(nearinds) if j == dupval]
        dupdists = distances[dupinds,1]

Using these dupdists I would be able to find that one is closer to the pt than the other:使用这些 dupdists，我将能够发现一个比另一个更接近 pt：

       if len(set(dupdists))==len(dupdists):
            duppriority = np.argsort(dupdists)

Using the duppriority values we can provide the closer pt its right pairing.使用duppriority值，我们可以提供更接近的 pt 其正确配对。 But to give the other point its pairing will then depend on its second nearest pairing and the distance of all other points to that same point.. Furthermore, if both points are the same distance to their closest point, I would also need to go one layer deeper:但是要给另一个点，它的配对将取决于它的第二个最近配对以及所有其他点到同一点的距离。此外，如果两个点到它们最近点的距离相同，我还需要 go 一个更深一层：

        if len(set(dupdists))!=len(dupdists):
            dupdists2 = [distances[i,2] for i,j in enumerate(inds) if j == dupval]```
            if len(set(dupdists2))==len(dupdists2):
                duppriority2 = np.argsort(dupdists2)

etc.. ETC..

I am kind of stuck here and also feel it is not very efficient in this way, especially for more complicated conditions than 4 points and where multiple points can be similar distance to one or multiple nearest, second-nearest etc points..我有点卡在这里，也觉得这种方式效率不高，特别是对于比 4 个点更复杂的条件，并且多个点与一个或多个最近的、第二最近的点的距离相似。

I also found that with scipy there is a similar one-line command that could be used to get the distances and indices:我还发现 scipy 有一个类似的单行命令可用于获取距离和索引：

from scipy.spatial import cKDTree
distances,indices = cKDTree(matrix3).query(matrix3, k=len(matrix3))

so am wondering if one would be better to continue with vs the other.所以我想知道一个人是否会更好地继续与另一个人相比。

More specific problem that I want to solve我想解决的更具体的问题

I have a list of points and need to match them optimally to a list of points previous in time.我有一个点列表，需要将它们与之前的点列表进行最佳匹配。 Number of points is generally limited and ranges from 2 to 10 but is generally consistent over time (ie it won't jump much between values over time).点的数量通常是有限的，范围从 2 到 10，但随着时间的推移通常是一致的（即随着时间的推移，它不会在值之间跳跃太多）。 Data tends to look like:数据往往看起来像：

prevdat = {'loc': [(300, 200), (425, 400), (400, 300)], 'contid': [0, 1, 2]}
currlocs = [(435, 390), (405, 295), (290, 215),(440,330)]`

Pts in time are generally closer to themselves than to others.时间点通常比与他人更接近。 Thus I should be able to link identities of the points over time.因此，我应该能够随着时间的推移链接点的身份。 There are however a number of complications that need to be overcome:然而，有许多并发症需要克服：

sometimes there is no equal number of current and previous points有时当前点和以前的点数不相等
points often have the same closest neighbour but should not be able to be allocated the same identity点通常具有相同的最近邻居，但不应分配相同的身份
points sometimes have the same distance to closest neighbour (but very unlikely to 2nd, 3rd nearest-neighbours etc.点有时与最近邻居的距离相同（但不太可能与第二、第三最近邻居等）。

Any advice to help solve my problem would be much appreciated.任何有助于解决我的问题的建议将不胜感激。 I hope my examples and effort above will help.我希望我上面的例子和努力会有所帮助。 Thanks!谢谢！

Answer 1

This can be formulated as a mixed integer linear programming problem.这可以表述为混合 integer 线性规划问题。

In python you can model and solve such problems using cvxpy .在 python 中，您可以 model 并使用cvxpy解决此类问题。

def connect_point_cloud(points):
    '''
    Given a set of points computes return pairs of points that
    whose added distance is minimised
    '''
    N = points.shape[0];
    I, J = np.indices((N, N))
    d = np.sqrt(sum((points[I, i] - points[J, i])**2 for i in range(points.shape[1])));
    
    use = cvxpy.Variable((N, N), integer=True)
    # each entry use[i,j] indicates that the point i is connected to point j
    # each pair may count 0 or 1 times
    constraints = [use >= 0, use <= 1];
    # point i must be used in at most one connection
    constraints += [sum(use[i,:]) + sum(use[:, i]) <= 1 for i in range(N)]
    # at least floor(N/2) connections must be presented
    constraints += [sum(use[i,j] for i in range(N) for j in range(N)) >= N//2];
    
    # let the solver  to handle the problem
    P = cvxpy.Problem(cvxpy.Minimize(sum(use[i,j] * d[i,j] for i in range(N) for j in range(N))), constraints)
    dist = P.solve()
    return use.value

Here a piece of code to visualize the result for a 2D problem这是一段代码，用于可视化 2D 问题的结果

# create a random set with 50 points
p = np.random.rand(50, 2)
# find the pairs to with minimum distance
pairs = connect_point_cloud(p)

# plot all the points with circles
plt.plot(p[:, 0], p[:, 1], 'o')

# plot lines connecting the points
for i1, i2 in zip(*np.nonzero(pairs)):
    plt.plot([p[i1,0], p[i2,0]], [p[i1,1], p[i2,1]])

根据最近距离找到最佳唯一邻居对

问题描述

1 个解决方案

解决方案1
5 已采纳 2021-05-03 15:18:43

根据最近距离找到最佳唯一邻居对

问题描述

1 个解决方案

解决方案1 5 已采纳 2021-05-03 15:18:43

解决方案1
5 已采纳 2021-05-03 15:18:43