简体   繁体   中英

how to calculate Euclidean distance between two matrices in R

I have two huge matrices with equal dimensions. I want to calculate Euclidean distance between them. I know this is the function:

euclidean_distance <- function(p,q){
  sqrt(sum((p - q)^2))
}

and if these are two matrices:


set.seed(123)
    mat1 <- data.frame(x=sample(1:10000,3), 
                       y=sample(1:10000,3), 
                       z=sample(1:10000,3))
    mat2 <- data.frame(x=sample(1:100,3), 
                       y=sample(1:100,3), 
                       z=sample(1:1000,3))

then I need the answer be a new matrix 3*3 showing Euclidean distance between each pair of values of mat1 and mat2.

any suggestion please?

You can use the package pdist :

library(pdist)
dists <- pdist(t(mat1), t(mat2))
as.matrix(dists)
         [,1]      [,2]      [,3]
[1,]  9220.40  9260.735  8866.033
[2,] 12806.35 12820.086 12121.927
[3,] 11630.86 11665.869 11155.823

this will give you all Euclidean distances of the pairs: (mat1$x,mat2$x), (mat1$x,mat2$y),..., (mat1$z,mat2$z)

This is a job for the base function outer :

outer(mat1,mat2,Vectorize(euclidean_distance))
x         y         z
x  9220.40  9260.736  8866.034
y 12806.35 12820.086 12121.927
z 11630.86 11665.869 11155.823

By using a combination of outer and tcrossprod , you can do it slightly faster in base R than with pdist::pdist :

> library(Rfast);library(pdist);library(microbenchmark)
> mat1=matrix(sample(1e4),ncol=10);mat2=matrix(sample(2e4),ncol=10)
> microbenchmark(times=10,dista={Rfast::dista(mat1,mat2)},
+ tcrossprod={sqrt(outer(rowSums(mat1^2),rowSums(mat2^2),"+")-tcrossprod(mat1,2*mat2))},
+ pdist=pdist::pdist(mat1,mat2))
 Unit: milliseconds
       expr      min       lq     mean   median       uq      max neval
      dista 36.54904 36.92631 41.87249 40.65317 44.69080 55.03735    10
 tcrossprod 27.93673 31.27754 37.89734 36.94241 42.85898 52.00793    10
      pdist 37.83836 39.03578 43.54179 44.04779 46.86191 50.45233    10

Or this isn't what the OP asked, but if you have two matrices with the same dimensions, it creates a vector of the distance between the first rows of the matrices, the second rows of the matrices, and so on:

sapply(rowSums((mat1-mat2)^2),sqrt)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM