Compute distances between all points in array efficiently using Python

Question

I have a list of N=3 points like this as input: points = [[1, 1], [2, 2], [4, 4]]

I wrote this code to compute all possible distances between all elements of my list points , as dist = min(∣x1−x2∣,∣y1−y2∣) :

distances = []
for i in range(N-1):
    for j in range(i+1,N):
        dist = min((abs(points[i][0]-points[j][0]), abs(points[i][1]-points[j][1])))
        distances.append(dist)
print(distances)

My output will be the array distances with all the distances saved in it: [1, 3, 2]

It works fine with N=3 , but I would like to compute it in a more efficiently way and be free to set N=10^5 . I am trying to use also numpy and scipy , but I am having a little trouble with replacing the loops and use the correct method.

Can anybody help me please? Thanks in advance

Answer 1

The numpythonic* solution*

To compute your distances using the full power of Numpy , and do it substantially faster:

Convert your points to a Numpy array:
```
 pts = np.array(points)
```

Then run:

 dist = np.abs(pts[np.newaxis, :, :] - pts[:, np.newaxis, :]).min(axis=2)

Here the result is a square array. But if you want to get a list of elements above the diagonal, just like your code generates, you can run:

dist2 = dist[np.triu_indices(pts.shape[0], 1)].tolist()

I ran this code for the following 9 points:

points = [[1, 1], [2, 2], [4, 4], [3, 5], [2, 8], [4, 10], [3, 7], [2, 9], [4, 7]]

For the above data, the result saved in dist (a full array) is:

array([[0, 1, 3, 2, 1, 3, 2, 1, 3],
       [1, 0, 2, 1, 0, 2, 1, 0, 2],
       [3, 2, 0, 1, 2, 0, 1, 2, 0],
       [2, 1, 1, 0, 1, 1, 0, 1, 1],
       [1, 0, 2, 1, 0, 2, 1, 0, 1],
       [3, 2, 0, 1, 2, 0, 1, 1, 0],
       [2, 1, 1, 0, 1, 1, 0, 1, 0],
       [1, 0, 2, 1, 0, 1, 1, 0, 2],
       [3, 2, 0, 1, 1, 0, 0, 2, 0]])

and the list of elements from upper diagonal part is:

[1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 0, 2, 1, 0, 2, 1, 2, 0, 1, 2, 0, 1, 1, 0, 1, 1,
  2, 1, 0, 1, 1, 1, 0, 1, 0, 2]

How faster is my code

It turns out that even for such small sample like I used ( 9 points), my code works 2 times faster . For a sample of 18 points (not presented here) - 6 times faster.

This difference in speed has been gained even though my function computes "2 times more than needed" ie it generates a full array, whereas the lower diagonal part of the result in a "mirror view" of the upper diagonal part (what computes your code).

For bigger number of points the difference should be much bigger. Make your test on a bigger sample of points (say 100 points) and write how many times faster was my code.

Compute distances between all points in array efficiently using Python

Question

1 answers

solution1
1 ACCPTED 2020-08-31 17:58:28

The numpythonic* solution*

How faster is my code

Compute distances between all points in array efficiently using Python

Question

1 answers

solution1 1 ACCPTED 2020-08-31 17:58:28

The numpythonic solution

How faster is my code

solution1
1 ACCPTED 2020-08-31 17:58:28

The numpythonic* solution*