简体   繁体   English

成对距离两个矩阵R

[英]Pairwise distance two matrix R

I want to run distance bewteen matrix in R. In this example I use manhattan distance but I would like to apply other formuls. 我想在R中的矩阵之间运行距离。在此示例中,我使用曼哈顿距离,但我想应用其他公式。 My question is , is there one way to apply function to row of a matrix by row of other matrix in R? 我的问题是,是否有一种方法可以将函数逐行应用于R中的其他矩阵?

In this example I have only two variables but I would like to apply with more than 10 vars. 在此示例中,我只有两个变量,但是我想应用10个以上的变量。

Thanks. 谢谢。

set.seed(123)
mat1 <- data.frame(x=sample(1:10000,3), 
                   z=sample(1:10000,3))
mat2 <- data.frame(x=sample(1:100,3), 
                   z=sample(1:1000,3))

dista<-matrix(0,ncol=2,nrow=2)
for (j in 1:nrow(mat1)){
  for(i in 1:nrow(mat2)){
    dista[i,j]<-sqrt((mat1[i,1]-mat2[j,1]) + (mat1[i,2]-mat2[j,2]))
  }
}

dista

You can use the proxy package for these problems. 您可以使用proxy程序包来解决这些问题。 By default, proxy::dist considers each row of a matrix or data frame as a single "object". 默认情况下, proxy::dist将矩阵或数据帧的每一行视为一个“对象”。

library(proxy)

proxy::dist(mat1, mat2, method="Manhattan")
     [,1]  [,2]  [,3] 
[1,]  4804  4832  4656
[2,]  3708  3736  3560
[3,] 17407 17435 17259

proxy::dist(mat1, mat2, method="Euclidean")
     [,1]      [,2]      [,3]     
[1,]  3397.036  3417.059  3295.962
[2,]  2761.996  2787.495  2708.075
[3,] 12308.674 12328.422 12204.286

Type vignette("overview", "proxy") in the R console to see which similarities and distances it includes, and check the documentation of proxy::pr_DB if you would like to add your own functions that can be used with proxy::dist . 在R控制台中键入vignette("overview", "proxy")以查看其包含哪些相似性和距离,并查看proxy::pr_DB的文档(如果要添加可与proxy::dist一起使用的函数) proxy::dist

Your code is still wrong as the output should be of length nrow(mat1) * nrow(mat2) which is 9 , and that cannot fit in a 2x2 matrix (which you previously define). 您的代码仍然是错误的,因为输出的长度应为nrow(mat1) * nrow(mat2) ,其长度为9 ,并且不能适合2x2矩阵(您先前定义)。 Also, the i should run through mat1 and j through mat2 ; 另外, i应该穿过mat1j应该穿过mat2 you have it the other way around. 反之亦然。 Changing dista[i,j] <- for a print() , you'd obtain: print()更改dista[i,j] <- ,您将获得:

dista<-matrix(0,ncol=2,nrow=2)
for (i in 1:nrow(mat1)){
    for(j in 1:nrow(mat2)){
        print(sqrt((mat1[i,1]-mat2[j,1]) + (mat1[i,2]-mat2[j,2])))
    }
}
[1] 105.8159
[1] 129.5261
[1] 63.52165
[1] 103.257
[1] 127.4441
[1] 59.1608
[1] 105.8253
[1] 129.5338
[1] 63.53739

You can use outer to limit the calculations to just one vectorised function 您可以使用outer将计算限制为仅一个矢量化函数

y = outer(1:nrow(mat1),1:nrow(mat2),paste)
y
     [,1]  [,2]  [,3] 
[1,] "1 1" "1 2" "1 3"
[2,] "2 1" "2 2" "2 3"
[3,] "3 1" "3 2" "3 3"

sapply(as.vector(y), function(x){
  aux = as.numeric(strsplit(x," ")[[1]])
  sqrt((mat1[aux[1],1]-mat2[aux[2],1]) + (mat1[aux[1],2]-mat2[aux[2],2]))})

      1 1       2 1       3 1       1 2       2 2       3 2       1 3       2 3       3 3 
105.81588 129.52606  63.52165 103.25696 127.44411  59.16080 105.82533 129.53378  63.53739 

Here, we first create a y matrix which contains all the i and j combinations, feed it to sapply and then split it to get i and j individually. 在这里,我们首先创建一个包含所有ij组合的y矩阵,将其馈给sapply ,然后将其拆分以分别获得ij

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM