简体   繁体   中英

Find optimal unique neighbour pairs based on closest distance

General problem

First let's explain the problem more generally. I have a collection of points with x,y coordinates and want to find the optimal unique neighbour pairs such that the distance between the neighbours in all pairs is minimised, but points cannot be used in more than one pair.

Some simple examples

Note: points are not ordered and x and y coordinates will both vary between 0 and 1000, but for simplicity in below examples x==y and items are ordered.

First, let's say I have the following matrix of points:

matrix1 = np.array([[1, 1],[2, 2],[5, 5],[6, 6]])

For this dataset, the output should be [0,0,1,1] as points 1 and 2 are closest to each other and points 3 and 4, providing pairs 0 and 2.

Second, two points cannot have the same partner. If we have the matrix:

matrix2 = np.array([[1, 1],[2, 2],[4, 4],[6, 6]])

Here pt1 and pt3 are closest to pt2, but pt1 is relatively closer, so the output should again be [0,0,1,1] .

Third, if we have the matrix:

matrix3 = np.array([[1, 1],[2, 2],[3, 3],[4, 4]])

Now pt1 and pt3 are again closest to pt2 but now they are at the same distance. Now the output should again be [0,0,1,1] as pt4 is closest to pt3.

Fourth, in the case of an uneven number of points, the furthest point should be made nan, eg

matrix4 = np.array([[1, 1],[2, 2],[4,4]])

should give output [0,0,nan]

Fifth, in the case there are three or more points with exactly the same distance, the pairing can be random, eg

matrix5 = np.array([[1, 1],[2, 2],[3, 3]])

both an output of '[0,0,nan] and [nan,0,0]` should be fine.

My effort

Using sklearn:

import numpy as np
from sklearn.neighbors import NearestNeighbors
data = matrix3
nbrs = NearestNeighbors(n_neighbors=len(data), algorithm="ball_tree")
nbrs = nbrs.fit(data)
distances,indices = nbrs.kneighbors(data)

This outputs instances:

array([[0, 1, 2, 3],
       [1, 2, 0, 3],
       [2, 1, 3, 0],
       [3, 2, 1, 0]]))

The second column provides the nearest points:

nearinds = `indices[:,1]`

Next in case there are duplicates in the list we need to find the nearest distance:

if len(set(nearinds) != len(nearinds):
    dupvals = [i for i in set(nearinds) if list(nearinds).count(i) > 1]
    for dupval in dupvals:
        dupinds = [i for i,j in enumerate(nearinds) if j == dupval]
        dupdists = distances[dupinds,1]

Using these dupdists I would be able to find that one is closer to the pt than the other:

       if len(set(dupdists))==len(dupdists):
            duppriority = np.argsort(dupdists)

Using the duppriority values we can provide the closer pt its right pairing. But to give the other point its pairing will then depend on its second nearest pairing and the distance of all other points to that same point.. Furthermore, if both points are the same distance to their closest point, I would also need to go one layer deeper:

        if len(set(dupdists))!=len(dupdists):
            dupdists2 = [distances[i,2] for i,j in enumerate(inds) if j == dupval]```
            if len(set(dupdists2))==len(dupdists2):
                duppriority2 = np.argsort(dupdists2)  

etc..

I am kind of stuck here and also feel it is not very efficient in this way, especially for more complicated conditions than 4 points and where multiple points can be similar distance to one or multiple nearest, second-nearest etc points..

I also found that with scipy there is a similar one-line command that could be used to get the distances and indices:

from scipy.spatial import cKDTree
distances,indices = cKDTree(matrix3).query(matrix3, k=len(matrix3))

so am wondering if one would be better to continue with vs the other.

More specific problem that I want to solve

I have a list of points and need to match them optimally to a list of points previous in time. Number of points is generally limited and ranges from 2 to 10 but is generally consistent over time (ie it won't jump much between values over time). Data tends to look like:

prevdat = {'loc': [(300, 200), (425, 400), (400, 300)], 'contid': [0, 1, 2]}
currlocs = [(435, 390), (405, 295), (290, 215),(440,330)]`

Pts in time are generally closer to themselves than to others. Thus I should be able to link identities of the points over time. There are however a number of complications that need to be overcome:

  1. sometimes there is no equal number of current and previous points
  2. points often have the same closest neighbour but should not be able to be allocated the same identity
  3. points sometimes have the same distance to closest neighbour (but very unlikely to 2nd, 3rd nearest-neighbours etc.

Any advice to help solve my problem would be much appreciated. I hope my examples and effort above will help. Thanks!

This can be formulated as a mixed integer linear programming problem.

In python you can model and solve such problems using cvxpy .

def connect_point_cloud(points):
    '''
    Given a set of points computes return pairs of points that
    whose added distance is minimised
    '''
    N = points.shape[0];
    I, J = np.indices((N, N))
    d = np.sqrt(sum((points[I, i] - points[J, i])**2 for i in range(points.shape[1])));
    
    use = cvxpy.Variable((N, N), integer=True)
    # each entry use[i,j] indicates that the point i is connected to point j
    # each pair may count 0 or 1 times
    constraints = [use >= 0, use <= 1];
    # point i must be used in at most one connection
    constraints += [sum(use[i,:]) + sum(use[:, i]) <= 1 for i in range(N)]
    # at least floor(N/2) connections must be presented
    constraints += [sum(use[i,j] for i in range(N) for j in range(N)) >= N//2];
    
    # let the solver  to handle the problem
    P = cvxpy.Problem(cvxpy.Minimize(sum(use[i,j] * d[i,j] for i in range(N) for j in range(N))), constraints)
    dist = P.solve()
    return use.value

Here a piece of code to visualize the result for a 2D problem

# create a random set with 50 points
p = np.random.rand(50, 2)
# find the pairs to with minimum distance
pairs = connect_point_cloud(p)

# plot all the points with circles
plt.plot(p[:, 0], p[:, 1], 'o')

# plot lines connecting the points
for i1, i2 in zip(*np.nonzero(pairs)):
    plt.plot([p[i1,0], p[i2,0]], [p[i1,1], p[i2,1]])

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM