How can I optimize the distance between 2 points (x,y,z) and two arrays

Question

I need to calculate the distance between each pixcel and each centroid.

Arguments:

X (numpy array): PxD 1st set of data points (usually data points)
C (numpy array): KxD 2nd set of data points (usually cluster centroids points)

Returns:

dist: PxK numpy array position ij is the distance between the i-th point of the first set an the j-th point of the second set

def distance(X, C):

    dist = numpy.empty((X.shape[0], C.shape[0]))

    for i,x in enumerate(X):
        for y,c in enumerate(C):
            dist[i][y] = euclidean_dist(x,c)

    return dist

def euclidean_dist(x, y):
    x1, y1, z1 = x
    x2, y2, z2 = y
    return math.sqrt((x1-x2)**2 + (y1-y2)**2 + (z1-z2)**2)

Answer 1

If you can add scipy dependency, then this is already implemented in scipy.spatial.distance.cdist . Otherwise we can use numpy.broadcasting and numpy.linalg.norm :

Scipy Implemenation

from scipy.spatial import distance
distance.cdist(X, C, 'euclidean')

Numpy Implementation

import numpy as np
np.linalg.norm(X[:,None,:] - C, axis=2)

Performance

P = 100_000
K = 10_00
D = 3

X = np.random.randint(0,10, (P,D))
C = np.random.randint(0,10, (K,D))

%timeit distance.cdist(X, C, 'euclidean')
1.06 s ± 57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit np.linalg.norm(X[:,None,:] - C, axis=2)
15 s ± 2.18 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

We can see that for large sizes of X and C scipy implementation is way faster.

How can I optimize the distance between 2 points (x,y,z) and two arrays

Question

1 answers

solution1
4 ACCPTED 2020-04-13 19:53:00

How can I optimize the distance between 2 points (x,y,z) and two arrays

Question

1 answers

solution1 4 ACCPTED 2020-04-13 19:53:00

solution1
4 ACCPTED 2020-04-13 19:53:00