简体   繁体   English

从多个输入矩阵构造R中的距离矩阵

[英]construct distance matrix in R, but from multiple input matrices

There are some R functions to construct distance matrices by inputing a matrix/data frame ( x ) and specifying a distance measure (eg Euclidean ), such as the dist function in stats R package (default). 有一些R函数可以通过输入矩阵/数据帧( x )并指定距离度量(例如Euclidean )来构造距离矩阵,例如stats R包中的dist函数(默认)。 The proxy R package has a dist function (yes, the same name) that extends the stats:dist : it has the argument method from which users can pass a function, a registry entry, or a mnemonic string referencing the proximity measure. proxy R软件包具有dist函数(是,同名),用于扩展stats:dist :它具有自变量method ,用户可以从中传递函数,注册表项或引用接近度的助记符字符串。 This is very convenient if users have their own distance measure programmed as a function. 如果用户将自己的距离测量功能编程,则非常方便。 For example (from help document in proxy ): 例如(来自proxy帮助文档):

## input matrix
x <- matrix(rnorm(16), ncol = 4)
## custom distance function
f <- function(x, y) sum(x * y)
dist(x, f)

The resultant distance matrix indicates that (for instance) the distance between row 1 and row 2 of x is 2.32, which can be manually calculated as sum(x[1,]*x[2,]) . 结果距离矩阵表示(例如) x第1行和第2行之间的距离为2.32,可以手动将其计算为sum(x[1,]*x[2,]) Note that the function f takes two arguments x and y , which are essentially two rows of the input matrix x in the proxy:dist function. 请注意,函数f带有两个参数xy ,它们实际上是proxy:dist函数中输入矩阵x两行。 In other words, the distance calculation relies entirely on the input matrix x alone . 换句话说,距离计算完全依赖于输入矩阵x

Here is my question : I also want to calculate a distance matrix for the input matrix x (ie rows are observations and I want to get the pairwise distance between rows of x ). 这是我的问题 :我还想为输入矩阵x计算距离矩阵(即行是观察值,我想获得x行之间的成对距离)。 However, the function I use to calculate the distance does NOT rely solely on the input matrix x but actually on some matrices derived from x . 但是,我用来计算距离的函数并不仅仅取决于输入矩阵x而是实际上取决于从x派生的某些矩阵 I store the necessary matrices in a list called prep_matrices , which consists of three matrices: A,B,C (I made up these for reproducible results): 我将必要的矩阵存储在名为prep_matrices的列表中,该列表由三个矩阵组成: A,B,C (为得到可重现的结果A,B,C我将其组成):

set.seed(111)
A = matrix(rnorm(9), nr=3)
set.seed(222)
B = matrix(rnorm(9), nr=3)
set.seed(333)
C = matrix(rnorm(9), nr=3)

Obviously the input matrix x is 3-by-3 and prep_matrices$A, prep_matrices$B, prep_matrices$C will give the derived matrices from x . 显然,输入矩阵x是3-by-3,并且prep_matrices$A, prep_matrices$B, prep_matrices$C将给出从x派生的矩阵。 Now assume that the distance between two rows of x is calculated as (for instance, row 1 and row 2): 现在假设x两行之间的距离的计算方式为(例如,第1行和第2行):

m1 = diag(A[1, ])
m2 = diag(A[2, ])
b1 = B[1, ]
b2 = B[2, ]
c1 = C[1, ]
c2 = C[2, ]
distance = mean(m1 %*% ( (diag(b1)-diag(b2)) %*% (diag(c1)-diag(c2)) %*% m2))

This example is for illustrations only, but I hope you'll get the idea of how the distance is calculated. 此示例仅用于说明,但希望您能了解距离的计算方法。 I realize, then, that it might be impossible to pass a list ( prep_matrices ) to some R functions and get the distance directly, as there are more extra calculations involved and most importantly, the distance is not based on the input matrix but instead on many derived matrices... 然后,我意识到,可能无法将列表( prep_matrices )传递给某些R函数并直接获得距离,因为涉及更多的额外计算,最重要的是,距离不是基于输入矩阵,而是基于许多派生矩阵...

Is there a way to efficiently code in R to get a distance matrix in this case? 在这种情况下,有没有办法有效地在R中编码以获得距离矩阵? Or we could possibly modify existing R functions? 或者我们可以修改现有的R函数? Thanks a lot! 非常感谢!

Depending on how complicated the distance function is, you could just forget about dist and write a function that takes in row numbers i,j and computes the distance of those two rows. 根据距离函数的复杂程度,您可能会忘记dist并编写一个函数,该函数接受行号i,j并计算这两行的距离。 So for your example, it would look like this: 因此,对于您的示例,它看起来像这样:

ff<-function(i,j) mean(diag(A[i,]) %*% ( (diag(B[i,])-diag(B[j,])) %*% (diag(C[i,])-diag(C[j,])) %*% diag(A[j,])))

Then you could get the distance matrix by applying this to 1:nrow(x) which in this case would be 然后您可以通过将其应用于1:nrow(x)来获得距离矩阵,在这种情况下

distMatrix<-outer(1:3,1:3,Vectorize(ff))

The Vectorize is necessary because outer expects a vectorized function. Vectorize是必要的,因为outer期望向量化的功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM