[英]How to efficiently extract a row or column from a “dist” distance matrix
I'm working with a 39000+ data points and I'm computing the distance between a point and every single other one of them, resulting in a (39000+)^2 matrix that consumes 11GB (and I can't allocate this in the memory). 我正在使用39000+个数据点,并且正在计算一个点与每个其他点之间的距离,导致(39000 +)^ 2矩阵消耗11GB(而我无法在其中分配记忆)。
Great thing we have the dist
function that allows me to reduce this to a little bit less than 6GB. 很棒的事情是,我们具有
dist
函数,该函数可使我将其减少到不到6GB。 But now, I need to calculate the inverse distances powered by 2 and then regularize every row so that they sum up to 1. This is necessary because I will later multiply every row of the matrix by a vector and store this result. 但是现在,我需要计算由2乘以的逆距离,然后对每一行进行正则化,使它们的总和为1。这是必要的,因为稍后我将矩阵的每一行乘以一个向量并将其存储。 So, the big matrix is actually a temporary thing.
因此,大矩阵实际上是暂时的。
My question is, how can I extract rows of this dist
matrix? 我的问题是,如何提取此
dist
矩阵的行?
A sample "dist" matrix obtained with dist(cbind(runif(5),runif(5))
用
dist(cbind(runif(5),runif(5))
获得的样本“ dist”矩阵
1 2 3 4
2 0.47
3 0.63 0.72
4 0.79 0.62 0.37
5 0.53 0.15 0.62 0.48
What I'm looking for is to extract the entire first line, for instance: 我正在寻找的是提取整个第一行,例如:
0 0.47 0.63 0.79 0.53
Resort to function f
in my old answer here . 在我的旧答案中求助于
f
。
f <- function (i, j, dist_obj) {
if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
n <- attr(dist_obj, "Size")
valid <- (i >= 1) & (j >= 1) & (i > j) & (i <= n) & (j <= n)
k <- (2 * n - j) * (j - 1) / 2 + (i - j)
k[!valid] <- NA_real_
k
}
A helper function to extract a single row / column (a slice). 提取单个行/列(切片)的辅助函数。
SliceExtract_dist <- function (dist_obj, k) {
if (length(k) > 1) stop("The function is not 'vectorized'!")
n <- attr(dist_obj, "Size")
if (k < 1 || k > n) stop("k out of bound!")
##
i <- 1:(k - 1)
j <- rep.int(k, k - 1)
v1 <- dist_obj[f(j, i, dist_obj)]
##
i <- (k + 1):n
j <- rep.int(k, n - k)
v2 <- dist_obj[f(i, j, dist_obj)]
##
c(v1, 0, v2)
}
Example 例
set.seed(0)
( d <- dist(cbind(runif(5),runif(5))) )
# 1 2 3 4
#2 0.9401067
#3 0.9095143 0.1162289
#4 0.5618382 0.3884722 0.3476762
#5 0.4275871 0.6968296 0.6220650 0.3368478
SliceExtract_dist(d, 1)
#[1] 0.0000000 0.9401067 0.9095143 0.5618382 0.4275871
SliceExtract_dist(d, 2)
#[1] 0.9401067 0.0000000 0.1162289 0.3884722 0.6968296
SliceExtract_dist(d, 3)
#[1] 0.9095143 0.1162289 0.0000000 0.3476762 0.6220650
SliceExtract_dist(d, 4)
#[1] 0.5618382 0.3884722 0.3476762 0.0000000 0.3368478
SliceExtract_dist(d, 5)
#[1] 0.4275871 0.6968296 0.6220650 0.3368478 0.0000000
Sanity check 完整性检查
as.matrix(d)
# 1 2 3 4 5
#1 0.0000000 0.9401067 0.9095143 0.5618382 0.4275871
#2 0.9401067 0.0000000 0.1162289 0.3884722 0.6968296
#3 0.9095143 0.1162289 0.0000000 0.3476762 0.6220650
#4 0.5618382 0.3884722 0.3476762 0.0000000 0.3368478
#5 0.4275871 0.6968296 0.6220650 0.3368478 0.0000000
Note: Function to extract diagonals readily exists. 注意: 存在提取对角线的功能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.