简体   繁体   English

将稀疏矩阵写入R中的CSV

[英]Write a Sparse Matrix to a CSV in R

I have a sparse matrix ( dgCMatrix ) as the result of fitting a glmnet . 我有一个稀疏矩阵( dgCMatrix )作为拟合glmnet的结果。 I want to write this result to a .csv but can't use write.table() the matrix because it can't coerced into a data.frame . 我想将此结果写入.csv但不能使用write.table()矩阵,因为它无法强制转换为data.frame

Is there a way to coerce the sparse matrix to either a data.frame or a regular matrix? 有没有办法将稀疏矩阵强制转换为data.frame或常规矩阵? Or is there a way to write it to a file while keeping the coefficient names which are probably row names? 或者有没有办法将它写入文件,同时保留可能是行名称的系数名称?

That will be dangerous to transform the sparse matrix to a normal one, if the sparse matrix size is too large. 如果稀疏矩阵大小太大,那么将稀疏矩阵变换为正常矩阵将是危险的。 In my case (text classification task), I got a matrix of size 22490 by 120,000. 在我的情况下(文本分类任务),我得到了一个大小为22490×120,000的矩阵。 If you try get the dense matrix, that will be more than 20 GB, I think. 如果你尝试获得密集矩阵,我认为这将超过20 GB。 Then R will break down ! 然后R会崩溃!

So my suggestion, you may simply store the sparse matrix in an efficient and memory friendly way, such as Matrix Market Format , which keeps all non-zero values and their coordinates (row & col number). 所以我的建议是,您可以简单地以有效且内存友好的方式存储稀疏矩阵,例如Matrix Market Format ,它保留所有非零值及其坐标(行和列号)。 In the R you can use the method writeMM 在R中你可以使用writeMM方法

as.matrix() will convert to the full dense representation: as.matrix()将转换为完整的密集表示:

> as.matrix(Matrix(0, 3, 2))
     [,1] [,2]
[1,]    0    0
[2,]    0    0
[3,]    0    0

You can write the resulting object out using write.csv or write.table . 您可以使用write.csvwrite.table编写结果对象。

Converting directly to a dense matrix is likely to waste a lot of memory. 直接转换为密集矩阵可能会浪费大量内存。 The R package Matrix allows converting the sparse matrix into a memory-efficient coordinate triplet format data frame using the summary() function, which could then be written easily to csv. R包Matrix允许使用summary()函数将稀疏矩阵转换为内存有效的坐标三元组格式数据帧,然后可以轻松地将其写入csv。 This is probably simpler and easier than the matrix market approach. 这可能比矩阵市场方法更简单,更容易。 See the answer to this related question: Sparse matrix to a data frame in R 请参阅此相关问题的答案:将矩阵稀疏到R中的数据框

Also, here is an illustration from the Matrix package documentation : 另外,这是Matrix包文档中的插图:

## very simple export - in triplet format - to text file:
data(CAex)
s.CA <- summary(CAex)
s.CA # shows  (i, j, x)  [columns of a data frame]
message("writing to ", outf <- tempfile())
write.table(s.CA, file = outf, row.names=FALSE)
## and read it back -- showing off  sparseMatrix():
str(dd <- read.table(outf, header=TRUE))
## has columns (i, j, x) -> we can use via do.call() as arguments to sparseMatrix():
mm <- do.call(sparseMatrix, dd)
stopifnot(all.equal(mm, CAex, tolerance=1e-15))
# input: a sparse matrix with named rows and columns (dimnames)
# returns: a data frame representing triplets (r, c, x) suitable for writing to a CSV file
sparse2triples <- function(m) {
 SM = summary(m)
 D1 = m@Dimnames[[1]][SM[,1]]
 D2 = m@Dimnames[[2]][SM[,2]]
 data.frame(row=D1, col=D2, x=m@x)
}

Example

> library(Matrix)
> dn <- list(LETTERS[1:3], letters[1:5])
> m <- sparseMatrix(i = c(3,1,3,2,2,1), p= c(0:2, 4,4,6), x = 1:6, dimnames = dn)

> m
3 x 5 sparse Matrix of class "dgCMatrix"
  a b c d e
A . 2 . . 6
B . . 4 . 5
C 1 . 3 . .

> sparse2triples(m)
  row col x
1   C   a 1
2   A   b 2
3   B   c 4
4   C   c 3
5   A   e 6
6   B   e 5 

[EDIT: use data.frame] [编辑:使用data.frame]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM