简体   繁体   中英

Subsetting A Sparse Matrix in R

I am trying to subset a sparse matrix of class dgCMatrix into a training and test set, and then convert those matrices into xgb.DMatrix object to run eXtremeGradientBoosting. I run the following code (which is reproducible):

a<-data.frame(replicate(3,sample(1:1000,1000,rep=TRUE)))
b <- cast_sparse(a,X1,X2,X3)
c<-data.frame(replicate(3,sample(1:1000,1000,rep=FALSE)))

sample <- sample.int(n = nrow(c), size = floor(.75*nrow(c)), replace = F)
y.train <- c$X1[sample]
y.test  <- c$X1[-sample]
x.train <- as.matrix(as.data.frame(as.matrix(b))[sample,])
x.test <- b[-sample,]
train.xgb <- xgb.DMatrix(x.train, label = y.train)
test.xgb <- xgb.DMatrix(x.test, label = y.test)

When I run the last line, I get the following error:

Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) : 
  The length of labels must equal to the number of rows in the input data

For whatever reason, the dimensions of the x.test matrix is only 2 , whereas the label is of length 250 . I cannot figure out why this is happening -- any suggestions or ideas how to fix this?

Given that the purpose of the split is for xgboost, what you can do instead of spliting a dgCMatrix is to split a xgb.DMatrix using slice function:

total = xgb.DMatrix(as.matrix(b), label = c$X1)
train.xgb = xgboost::slice(total,sample)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM