I am trying to subset a sparse matrix of class dgCMatrix into a training and test set, and then convert those matrices into xgb.DMatrix object to run eXtremeGradientBoosting. I run the following code (which is reproducible):
a<-data.frame(replicate(3,sample(1:1000,1000,rep=TRUE)))
b <- cast_sparse(a,X1,X2,X3)
c<-data.frame(replicate(3,sample(1:1000,1000,rep=FALSE)))
sample <- sample.int(n = nrow(c), size = floor(.75*nrow(c)), replace = F)
y.train <- c$X1[sample]
y.test <- c$X1[-sample]
x.train <- as.matrix(as.data.frame(as.matrix(b))[sample,])
x.test <- b[-sample,]
train.xgb <- xgb.DMatrix(x.train, label = y.train)
test.xgb <- xgb.DMatrix(x.test, label = y.test)
When I run the last line, I get the following error:
Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) :
The length of labels must equal to the number of rows in the input data
For whatever reason, the dimensions of the x.test
matrix is only 2
, whereas the label is of length 250
. I cannot figure out why this is happening -- any suggestions or ideas how to fix this?
Given that the purpose of the split is for xgboost, what you can do instead of spliting a dgCMatrix
is to split a xgb.DMatrix
using slice
function:
total = xgb.DMatrix(as.matrix(b), label = c$X1)
train.xgb = xgboost::slice(total,sample)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.