简体   繁体   中英

numpy - distances between two points from vectors of shape(n, 2)

Background

Suppose I have a ndarray of coordinates of shape (n, 2) where each coordinate is (x, y).

X = np.random.random(shape) * 10  # just to generate (x,y).
---
[[9.47743968 8.60682597]
 [7.35620992 6.87031756]
 [5.05200433 3.62373581]
 [4.33732145 3.72994235]
 [4.34982473 4.46453609]]
...

A distance between two vectors Xi and Xj in X is |Xj - Xi| . Get the distances of all combinations in X could be done as in the image.

在此处输入图像描述

Question

Is it possible to do this with numpy only, not scipy (eg scipy.spatial.distance.pdist(X, metric='euclidean', *args, **kwargs) ? Please help understand available numpy functions and how to achieve it.

Research

SciPy

I though I could look into scipy code as scipy.spatial.distance.pdist(X, metric='euclidean', *args, **kwargs) seems to get the distances from vectors.

Pairwise distances between observations in n-dimensional space.

import scipy.spatial
scipy.spatial.distance.pdist(X)
---
array([2.74136411, 6.6645079 , 7.08553522, 6.59173729, 3.9811627 ,
       4.35610423, 3.85047223, 0.72253128, 1.09544571, 0.73470014])

scipy distance.py line 2050-2059 appears to be the code that calls a distance function for a corresponding method (eg euclidean). However it goes into a C code distance_wrap.c , hence not numpy.

static PyObject *pdist_seuclidean_double_wrap(PyObject *self, PyObject *args, 
                                              PyObject *kwargs) 
{
  PyArrayObject *X_, *dm_, *var_;
  int m, n;
  double *dm;
  const double *X, *var;
  static char *kwlist[] = {"X", "dm", "V", NULL};
  if (!PyArg_ParseTupleAndKeywords(args, kwargs, 
            "O!O!O!:pdist_seuclidean_double_wrap", kwlist,
            &PyArray_Type, &X_,
            &PyArray_Type, &dm_,
            &PyArray_Type, &var_)) {
    return 0;
  }
  else {
    NPY_BEGIN_ALLOW_THREADS;
    X = (double*)X_->data;
    dm = (double*)dm_->data;
    var = (double*)var_->data;
    m = X_->dimensions[0];
    n = X_->dimensions[1];

    pdist_seuclidean(X, var, dm, m, n);
    NPY_END_ALLOW_THREADS;
  }
  return Py_BuildValue("d", 0.0);
}
shape = (10,2)
X = np.random.random(shape)
# X[:,0] --> x values
# X[:,1] --> y values
dist = np.sqrt(np.sum((X[:,np.newaxis,:] - X[np.newaxis,:,:]) ** 2, axis = -1))

Example code that follow your process in the drawing with numpy:

import numpy as np

N = 3
shape = (N, 2)
X = np.random.random(shape) * 10

print("X:\n{}\n\n".format(X))
X_br = np.broadcast_to(X, (N, N, 2))
X_bc = np.repeat(X[:, np.newaxis], N, axis=1)
print("X broadcast to row:\n{}\n\nX broadcast to column:\n{}\n\n".format(X_br, X_bc))

sub_matrix = X_bc - X_br
distance_matrix = np.sqrt(np.sum(sub_matrix**2, axis = -1))
print("X_bc - X_br:\n{}\n\nAll distance:\n{}".format(sub_matrix, distance_matrix))

Outputs:

X:
[[2.05479961 2.74672028]
 [2.10302902 5.36635678]
 [4.82345137 9.0050768 ]]


X broadcast to row:
[[[2.05479961 2.74672028]
  [2.10302902 5.36635678]
  [4.82345137 9.0050768 ]]

 [[2.05479961 2.74672028]
  [2.10302902 5.36635678]
  [4.82345137 9.0050768 ]]

 [[2.05479961 2.74672028]
  [2.10302902 5.36635678]
  [4.82345137 9.0050768 ]]]

X broadcast to column:
[[[2.05479961 2.74672028]
  [2.05479961 2.74672028]
  [2.05479961 2.74672028]]

 [[2.10302902 5.36635678]
  [2.10302902 5.36635678]
  [2.10302902 5.36635678]]

 [[4.82345137 9.0050768 ]
  [4.82345137 9.0050768 ]
  [4.82345137 9.0050768 ]]]


X_bc - X_br:
[[[ 0.          0.        ]
  [-0.04822941 -2.6196365 ]
  [-2.76865177 -6.25835652]]

 [[ 0.04822941  2.6196365 ]
  [ 0.          0.        ]
  [-2.72042236 -3.63872002]]

 [[ 2.76865177  6.25835652]
  [ 2.72042236  3.63872002]
  [ 0.          0.        ]]]

All distance:
[[0.         2.62008043 6.8434245 ]
 [2.62008043 0.         4.54323466]
 [6.8434245  4.54323466 0.        ]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM