简体   繁体   中英

"Problem too large" when trying to save dgCMatrix as csv in R

I'm trying to convert an RDS file containing a sparse matrix (dgCMatrix) I received from a colleague into a plain text CSV file. I realize this file will be many gigabytes large, no need to warn me. I've tried using as.matrix but I get a "problem too large" error. How can I avoid this?

> write.csv(as.matrix(x), 'table.csv')
Loading required package: Matrix
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

Why not process the sparse matrix in chunks? The code below is a way of doing so.

library(Matrix)

write_sparse_csv <- function(x, file, ..., chunk = 100){
  passes <- nrow(x) %/% chunk
  remaining <- nrow(x) %% chunk
  if(passes > 0){
    inx <- seq_len(chunk)
    y <- x[inx, , drop = FALSE]
    y <- as.matrix(y)
    write.table(y, file, append = FALSE, sep = ",", col.names = !is.null(colnames(x)), ...)
    passes <- passes - 1L
    for(i in seq_len(passes)){
      inx <- inx + chunk
      y <- x[inx, , drop = FALSE]
      y <- as.matrix(y)
      write.table(y, file, append = TRUE, sep = ",", col.names = FALSE,  ...)
    }
    if(remaining > 0){
      inx <- inx + remaining
      y <- x[inx, , drop = FALSE]
      y <- as.matrix(y)
      write.table(y, file, append = TRUE, sep = ",", col.names = FALSE, ...)
    }
  } else if(remaining > 0){
    inx <- seq_len(remaining)
    y <- x[inx, , drop = FALSE]
    y <- as.matrix(y)
    write.table(y, file, append = FALSE, sep = ",", col.names = FALSE, ...)
  }
}

set.seed(2021)
n <- 1e6
M <- Matrix(sample(c(rep(0, 9*n/10), seq_len(n/10))), ncol = 5e2, sparse = TRUE)
dim(M)

write_sparse_csv(M, "~/tmp/test.csv")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM