简体   繁体   English

Memory 在 R 中将稀疏矩阵的对角线归零的有效方法

[英]Memory efficient way to zero out the diagonal of a sparse matrix in R

I'd like to zero out the diagonal of a sparse matrix in R.我想将 R 中稀疏矩阵的对角线归零。 My brute force way is explicitly setting it to zero, but this seems inefficient.我的蛮力方式是明确地将其设置为零,但这似乎效率低下。 Is there a more efficient way?有没有更有效的方法?

require(Matrix)
A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix")
diag(A) <- 0
A <- drop0(A)  # cleaning up

Clarification and resolution: my initial worry was that Matrix inflates the sparse matrix with actual zeros on the diagonal.澄清和解决:我最初担心的是 Matrix 用对角线上的实际零来膨胀稀疏矩阵。 This turns out not to be the case (in the end, although in the interim it is, see comment below).事实证明并非如此(最终,尽管在过渡期间是这样,请参阅下面的评论)。 To see this, consider what would happen if we were to set the diagonal to one:要看到这一点,请考虑如果我们将对角线设置为 1 会发生什么:

A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix")
format(object.size(A), units = "Mb")

[1] "38.3 Mb" [1] “38.3 兆字节”

diag(A) <- 1
format(object.size(A), units = "Mb")

[1] "152.7 Mb" [1] “152.7 兆字节”

The many non-zero elements we have added use up O(n) memory, where n is the dim of the matrix.我们添加的许多非零元素用完 O(n) memory,其中 n 是矩阵的暗淡。 However, with diag(A) <- 0 we get:但是,使用diag(A) <- 0我们得到:

diag(A) <- 1
format(object.size(A), units = "Mb")

[1] "38.3 Mb" [1] “38.3 兆字节”

Namely, Matrix already handles this situation efficiently.也就是说,Matrix 已经有效地处理了这种情况。

You can find the non-zero entries really quickly:您可以很快找到非零条目:

ij <- which(A != 0, arr.ind = TRUE)

# Subset to those on the diagonal:

ij <- ij[ij[,1] == ij[,2],,drop = FALSE]

# And set those entries to zero:

A[ij] <- 0

Edited to add:编辑添加:

As the revision to the original question says, this doesn't save much memory in the end, but it is much faster.正如对原始问题的修订所说,这最终并没有节省多少 memory ,但速度要快得多。 The diag(A) <- 0 statement takes about 3.2 seconds on my computer, whereas these 3 steps take about 0.2 seconds.在我的计算机上, diag(A) <- 0语句大约需要 3.2 秒,而这 3 个步骤大约需要 0.2 秒。 Here's how to do the timing:以下是如何进行计时:

library(microbenchmark)
microbenchmark(A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix"),
{A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix"); diag(A) <- 0},
{A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix");ij <- which(A != 0, arr.ind = TRUE);ij <- ij[ij[,1] == ij[,2],,drop = FALSE];A[ij] <- 0}, times = 10)

When I run it, I see median timings of 137 ms for the matrix creation and nothing else, 3351 ms for creation plus the diag(A) call, and 319 ms for creation followed by my code.当我运行它时,我看到矩阵创建的中位时间为 137 毫秒,仅此而已,创建加上 diag(A) 调用的时间为 3351 毫秒,然后是我的代码的创建时间为 319 毫秒。

It also saves a lot of memory in intermediate steps, which can be seen using memory profiling: Rprof(memory=TRUE); run code; Rprof(NULL); summaryRprof()中间步骤也省了不少memory,使用memory profiling可以看出: Rprof(memory=TRUE); run code; Rprof(NULL); summaryRprof() Rprof(memory=TRUE); run code; Rprof(NULL); summaryRprof() Rprof(memory=TRUE); run code; Rprof(NULL); summaryRprof() . Rprof(memory=TRUE); run code; Rprof(NULL); summaryRprof()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM