简体   繁体   中英

predict in caret ConfusionMatrix is removing rows

I'm fairly new to using the caret library and it's causing me some problems. Any help/advice would be appreciated. My situations are as follows:

I'm trying to run a general linear model on some data and, when I run it through the confusionMatrix , I get 'the data and reference factors must have the same number of levels'. I know what this error means (I've run into it before), but I've double and triple checked my data manipulation and it all looks correct (I'm using the right variables in the right places), so I'm not sure why the two values in the confusionMatrix are disagreeing. I've run almost the exact same code for a different variable and it works fine.

I went through every variable and everything was balanced until I got to the confusionMatrix predict. I discovered this by doing the following:

 a <- table(testing2$hold1yes0no)

 a[1]+a[2]


1543

 b <- table(predict(modelFit,trainTR2))

 dim(b)

[1] 1538

Those two values shouldn't disagree. Where are the missing 5 rows?

My code is below:

set.seed(2382)

inTrain2 <- createDataPartition(y=HOLD$hold1yes0no, p = 0.6, list = FALSE)

training2 <- HOLD[inTrain2,]

testing2 <- HOLD[-inTrain2,]

preProc2 <- preProcess(training2[-c(1,2,3,4,5,6,7,8,9)], method="BoxCox")

trainPC2 <- predict(preProc2, training2[-c(1,2,3,4,5,6,7,8,9)])

trainTR2 <- predict(preProc2, testing2[-c(1,2,3,4,5,6,7,8,9)])

modelFit <- train(training2$hold1yes0no ~ ., method ="glm", data = trainPC2)

confusionMatrix(testing2$hold1yes0no, predict(modelFit,trainTR2))

I'm not sure as I don't know your data structure, but I wonder if this is due to the way you set up your modelFit , using the formula method. In this case, you are specifying y = training2$hold1yes0no and x = everything else. Perhaps you should try:

modelFit <- train(trainPC2, training2$hold1yes0no, method="glm")

Which specifies y = training2$hold1yes0no and x = trainPC2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM