R package Matrix：获取稀疏矩阵每行/每列的非零条目数

Question

I have a large sparse matrix ("dgCMatrix", dimension 5e+5 x 1e+6). 我有一个大的稀疏矩阵（“ dgCMatrix”，尺寸5e + 5 x 1e + 6）。 I need to count for each column how many non-zero values there are and make a list of column names with only 1 non-zero entry. 我需要为每列计数有多少个非零值，并创建仅包含1个非零条目的列名列表。

My code works for small matrices, but becomes too computationally intensive for the actual matrix I need to work on. 我的代码适用于小型矩阵，但对于我需要处理的实际矩阵而言，计算量太大。

library(Matrix)
set.seed(0)
mat <- Matrix(matrix(rbinom(200, 1, 0.10), ncol = 20))
colnames(mat) <- letters[1:20]

entries <- colnames(mat[, nrow(mat) - colSums(mat == 0) == 1])

Any suggestion is very welcome! 任何建议都非常欢迎！

Answer 1

Similar results are produced using the following: Please notice the provided comments: 使用以下方法会产生类似的结果：请注意提供的注释：

## `mat != 0` returns a "lgCMatrix" which is sparse
## don't try `mat == 0` as that is dense, simply because there are too many zeros
entries <- colnames(mat)[colSums(mat != 0) == 1]

Answer 2

I have a large sparse matrix ("dgCMatrix") 我有一个大的稀疏矩阵（“ dgCMatrix”）

Let us call it dgCMat . 让我们将其dgCMat 。

I need to count for each column how many non-zero values there are 我需要为每列计数有多少个非零值

xx <- diff(dgCMat@p)

and make a list of column names with only 1 non-zero entry 并列出只有1个非零条目的列名

colnames(dgCMat)[xx == 1]

summary 摘要

nnz: number of non-zeros nnz：非零数

For a "dgCMatrix" dgCMat : 对于“ dgCMatrix” dgCMat ：

## nnz per column
diff(dgCMat@p)

## nnz per row
tabulate(dgCMat@i + 1)

For a "dgRMatrix" dgRMat : 对于“ dgRMatrix” dgRMat ：

## nnz per column
tabulate(dgRMat@j + 1)

## nnz per row
diff(dgRMat@p)

For a "dgTMatrix" dgTMat : 对于“ dgTMatrix” dgTMat ：

## nnz per column
tabulate(dgTMat@j + 1)

## nnz per row
tabulate(dgTMat@i + 1)

I did not read your original code when posting this answer. 发布此答案时，我没有阅读您的原始代码。 So I did not know that you got stuck with the use of mat == 0 . 所以我不知道您对使用mat == 0感到困惑。 Only till later I added the difference between mat == 0 and mat != 0 in your answer. 直到后来，我才在您的答案中加上了mat == 0和mat != 0之间的差异。

Your workaround using mat != 0 well exploits the package's feature. 使用mat != 0解决方法可以很好地利用软件包的功能。 That same line of code should work with other sparse matrix classes, too. 相同的代码行也应与其他稀疏矩阵类一起使用。 Mine goes straight to the internal storage, hence different versions are required for different classes. 我的直接进入内部存储，因此不同的类需要不同的版本。

Answer 3

I don't have the rep to comment on the accepted answer, but I wanted to point out that the tabulate-based functions fall apart if there are no non-zero entries in the last few columns / rows. 我没有代表对接受的答案发表评论，但我想指出的是，如果最后几列/行中没有非零条目，那么基于表格的函数就会崩溃。

This can be fixed by specifying the number of bins to tabulate: 可以通过指定要制表的箱数来解决此问题：

## nnz per row
tabulate(dgCMat@i + 1, nbins=nrow(dgCMat))

## nnz per column
tabulate(dgRMat@j + 1, nbins=ncol(dgRMat))

R package Matrix：获取稀疏矩阵每行/每列的非零条目数

问题描述

3 个解决方案

解决方案1
3 2018-07-27 15:01:06

解决方案2
1 已采纳 2018-07-27 14:57:30

解决方案3
1 2019-11-22 13:02:59

R package Matrix：获取稀疏矩阵每行/每列的非零条目数

问题描述

3 个解决方案

解决方案1 3 2018-07-27 15:01:06

解决方案2 1 已采纳 2018-07-27 14:57:30

解决方案3 1 2019-11-22 13:02:59

解决方案1
3 2018-07-27 15:01:06

解决方案2
1 已采纳 2018-07-27 14:57:30

解决方案3
1 2019-11-22 13:02:59