[英]Write a Sparse Matrix to a CSV in R
I have a sparse matrix ( dgCMatrix
) as the result of fitting a glmnet
. 我有一个稀疏矩阵(
dgCMatrix
)作为拟合glmnet
的结果。 I want to write this result to a .csv
but can't use write.table()
the matrix because it can't coerced into a data.frame
. 我想将此结果写入
.csv
但不能使用write.table()
矩阵,因为它无法强制转换为data.frame
。
Is there a way to coerce the sparse matrix to either a data.frame
or a regular matrix? 有没有办法将稀疏矩阵强制转换为
data.frame
或常规矩阵? Or is there a way to write it to a file while keeping the coefficient names which are probably row names? 或者有没有办法将它写入文件,同时保留可能是行名称的系数名称?
That will be dangerous to transform the sparse matrix to a normal one, if the sparse matrix size is too large. 如果稀疏矩阵大小太大,那么将稀疏矩阵变换为正常矩阵将是危险的。 In my case (text classification task), I got a matrix of size 22490 by 120,000.
在我的情况下(文本分类任务),我得到了一个大小为22490×120,000的矩阵。 If you try get the dense matrix, that will be more than 20 GB, I think.
如果你尝试获得密集矩阵,我认为这将超过20 GB。 Then R will break down !
然后R会崩溃!
So my suggestion, you may simply store the sparse matrix in an efficient and memory friendly way, such as Matrix Market Format , which keeps all non-zero values and their coordinates (row & col number). 所以我的建议是,您可以简单地以有效且内存友好的方式存储稀疏矩阵,例如Matrix Market Format ,它保留所有非零值及其坐标(行和列号)。 In the R you can use the method writeMM
在R中你可以使用writeMM方法
as.matrix()
will convert to the full dense representation: as.matrix()
将转换为完整的密集表示:
> as.matrix(Matrix(0, 3, 2))
[,1] [,2]
[1,] 0 0
[2,] 0 0
[3,] 0 0
You can write the resulting object out using write.csv
or write.table
. 您可以使用
write.csv
或write.table
编写结果对象。
Converting directly to a dense matrix is likely to waste a lot of memory. 直接转换为密集矩阵可能会浪费大量内存。 The R package Matrix allows converting the sparse matrix into a memory-efficient coordinate triplet format data frame using the
summary()
function, which could then be written easily to csv. R包Matrix允许使用
summary()
函数将稀疏矩阵转换为内存有效的坐标三元组格式数据帧,然后可以轻松地将其写入csv。 This is probably simpler and easier than the matrix market approach. 这可能比矩阵市场方法更简单,更容易。 See the answer to this related question: Sparse matrix to a data frame in R
请参阅此相关问题的答案:将矩阵稀疏到R中的数据框
Also, here is an illustration from the Matrix package documentation : 另外,这是Matrix包文档中的插图:
## very simple export - in triplet format - to text file:
data(CAex)
s.CA <- summary(CAex)
s.CA # shows (i, j, x) [columns of a data frame]
message("writing to ", outf <- tempfile())
write.table(s.CA, file = outf, row.names=FALSE)
## and read it back -- showing off sparseMatrix():
str(dd <- read.table(outf, header=TRUE))
## has columns (i, j, x) -> we can use via do.call() as arguments to sparseMatrix():
mm <- do.call(sparseMatrix, dd)
stopifnot(all.equal(mm, CAex, tolerance=1e-15))
# input: a sparse matrix with named rows and columns (dimnames)
# returns: a data frame representing triplets (r, c, x) suitable for writing to a CSV file
sparse2triples <- function(m) {
SM = summary(m)
D1 = m@Dimnames[[1]][SM[,1]]
D2 = m@Dimnames[[2]][SM[,2]]
data.frame(row=D1, col=D2, x=m@x)
}
> library(Matrix)
> dn <- list(LETTERS[1:3], letters[1:5])
> m <- sparseMatrix(i = c(3,1,3,2,2,1), p= c(0:2, 4,4,6), x = 1:6, dimnames = dn)
> m
3 x 5 sparse Matrix of class "dgCMatrix"
a b c d e
A . 2 . . 6
B . . 4 . 5
C 1 . 3 . .
> sparse2triples(m)
row col x
1 C a 1
2 A b 2
3 B c 4
4 C c 3
5 A e 6
6 B e 5
[EDIT: use data.frame] [编辑:使用data.frame]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.