简体   繁体   中英

Error in asMethod(object): Cholmod error 'problem too large'

I have the following object

Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:120671481] 0 2 3 6 10 13 21 22 25 36 ...
  ..@ p       : int [1:51366] 0 3024 4536 8694 3302271 3302649 5715381 5756541 5784009 5801691 ...
  ..@ Dim     : int [1:2] 10314738 51365
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:10314738] "line1" "line2" "line3" "line4" ...
  .. ..$ : chr [1:51365] "sparito" "davide," "15enne" "di" ...
  .. .. ..- attr(*, ".match.hash")=Class 'match.hash' <externalptr> 
  ..@ x       : num [1:120671481] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ factors : list()

This object comes from the function dtm_builder of text2map package. Since I would like to remove empty rows from the matrix, I thought about using the command:

raw.sum=apply(dtm,1,FUN=sum) #sum by raw each raw of the table
dtm2=dtm[raw.sum!=0,]

Anyway, I obtained the following error:

Error in asMethod(object): Cholmod error 'problem too large' at file ..

How could I fix it?

The short answer to your problem is that you're likely converting a sparse object to a dense object. Matrix package sparse matrix classes are very memory efficient when a matrix has a lot of zeros (like a DTM) by simply not allocating memory for the zeros.

@akrun's answer should work, but there is a rowSums function in base R and a rowSums function from the Matrix package. You would need to load the Matrix package first.

Here is an example dgCMatrix (note not loading Matrix package yet)

m1 <- Matrix::Matrix(1:9, 3, 3, sparse = TRUE)
m1[1, 1:3] <- 0
class(m1)

If we use the base R rowSums you get the error:

rowSums(m1)
Error in rowSums(dtm): 'x' must be an array of at least two dimensions

If the Matrix package is loaded, rowSums will be replaced with the Matrix package's own method, which works with dgCMatrix . This is also true for the bracket operators [ . If you update text2map to version 0.1.5, Matrix is loaded by default.

That is a massive DTM, so you may still run into memory issues -- which will depend on your machine. One thing to note is that removing sparse rows/columns will not help much. So, although words that occur once or twice will make up about 60% of your columns, you will reduce the size in terms of memory more by removing the most frequent words (ie words with a number in every row).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM