简体   繁体   中英

Using sparse matrix as an input to ranger package in R

Overview

To avoid memory issue, I have converted document term matrix to sparse matrix with “matrix” package using below piece of code:

library(matrix)
documentTermMatrixFrame <- Matrix(documentTermMatrixFrame, sparse = TRUE)

but when I try to use this matrix as an input to ranger() function of “ranger” package using below code:

library(ranger)
trainSet <- documentTermMatrixFrame[1:750,]
testSet <- documentTermMatrixFrame[751:999,]
fit <- ranger(trainingColumnNames, data=trainSet,write.forest=TRUE)

I am getting error:

Error in as.data.frame.default(data) : 
cannot coerce class "structure("dgCMatrix", package = "Matrix")" to a data.frame

Dataset

This is a sample of dataset which I am using

 <html> <table style="width:100%"> <tr> <th>nitemid</th> <th>sUnSpsc</th> <th>productDescription</th> </tr> <tr> <td>7460893</td> <td>26121609Network cable </td> <td>Category 6A, Advanced MaTriX, 4-pair, 23 AWG, U/UTP copper cable, Plenum (CMP) Rated, White, 1000ft/305m ""</td> </tr> <tr> <td>7460456</td> <td>26121709Network cable </td> <td>Shielded marine MUD-resistant armored copper cable, category 7 S/FTP, low smoke zero halogen (LSZH), 4-pair, conductors are 22 AWG construction with foamed PE insulation, twisted in pairs</td> </tr> <tr> <td>7460856</td> <td>26121890Inter connect cable </td> <td>1 PC. = 100 M 2 X 1.5 QMM, 100M SPECIAL DESIGN TO UL CLASS 2 YELLOW TPE OIL-RESISTANT AS-INTERFACE SHAPED CABLE</td> </tr> </html> 

After preprocessing the description in dataset using stopword removal, punctuation removal,stemming etc... document-term matrix will be created which is in turn converted to sparse matrix.

sample of Documnent-term matrix for Dataset

terms
doc   advance  category ..... ..... ....... ....... ....... twist
 1      1         1                                           0
 2      0         1                                           1
 3      0         0                                           0

Question

how to use sparse matrix as an input to ranger() function ?

Could anyone please help

Thanks in Advance

This is option is not supported for now. Ranger calls some C++-routine so you cannot pass a sparse matrix to it (just check Ranger's github) by means of R, you would need to rewrite Ranger itself. The only option is to convert the sparse matrix into a dense one and into a data frame. If that causes problems with memory, there is no simple solution at all.

Since Version 0.7.2, sparse matrices like the ones from the package Matrix can now be passed to ranger , see the discussion here . Extending to what is said in the thread, sparse matrices are now also supported in the CRAN version and do not need additional parameters like in the inital github version.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM