I'm trying to convert an RDS file containing a sparse matrix (dgCMatrix) I received from a colleague into a plain text CSV file. I realize this file will be many gigabytes large, no need to warn me. I've tried using as.matrix but I get a "problem too large" error. How can I avoid this?
> write.csv(as.matrix(x), 'table.csv')
Loading required package: Matrix
Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Why not process the sparse matrix in chunks? The code below is a way of doing so.
library(Matrix)
write_sparse_csv <- function(x, file, ..., chunk = 100){
passes <- nrow(x) %/% chunk
remaining <- nrow(x) %% chunk
if(passes > 0){
inx <- seq_len(chunk)
y <- x[inx, , drop = FALSE]
y <- as.matrix(y)
write.table(y, file, append = FALSE, sep = ",", col.names = !is.null(colnames(x)), ...)
passes <- passes - 1L
for(i in seq_len(passes)){
inx <- inx + chunk
y <- x[inx, , drop = FALSE]
y <- as.matrix(y)
write.table(y, file, append = TRUE, sep = ",", col.names = FALSE, ...)
}
if(remaining > 0){
inx <- inx + remaining
y <- x[inx, , drop = FALSE]
y <- as.matrix(y)
write.table(y, file, append = TRUE, sep = ",", col.names = FALSE, ...)
}
} else if(remaining > 0){
inx <- seq_len(remaining)
y <- x[inx, , drop = FALSE]
y <- as.matrix(y)
write.table(y, file, append = FALSE, sep = ",", col.names = FALSE, ...)
}
}
set.seed(2021)
n <- 1e6
M <- Matrix(sample(c(rep(0, 9*n/10), seq_len(n/10))), ncol = 5e2, sparse = TRUE)
dim(M)
write_sparse_csv(M, "~/tmp/test.csv")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.