简体   繁体   中英

R caret LDA error when using resampling

I am running into a problem using LDA through caret with caregorical predictors. For some reason, enabling resampling throws an error that isn't very informative. Has anyone seen this before?

Here is a reproducible toy example:

library(caret)
library(MASS)
DF <- data.frame(y = sample(as.factor(1:2), 200, replace = T), x1 = sample(as.factor(1:2), 200, replace = T), x2 = sample(as.factor(1:2), 200, replace = T))

# These two lines produce the same results
lda(DF[, -1], DF[, 1])
train(DF[, -1], DF[, 1], method = 'lda', trControl = trainControl(method = 'none'))$finalModel

# This gives an error
train(DF[, -1], DF[, 1], method = 'lda', trControl = trainControl(method = 'cv'))$finalModel

Error in train.default(DF[, -1], DF[, 1], method = "lda", trControl = trainControl(method = "cv")) : 
  Stopping

This seems to happen when using factor variables as independent variables while not using the formula interface. This works:

train(y ~ x1 + x2, data = DF, method = 'lda', 
      trControl = trainControl(method = 'cv'))$finalModel

Alternatively, after converting the factor variables to binary dummy variables the x/y-Syntax also works:

# Convert independent variables to dummy variables
DF$x1 <- as.numeric(DF$x1 == "2")
DF$x2 <- as.numeric(DF$x2 == "2")
train(DF[, -1], DF[, 1], method = 'lda', 
      trControl = trainControl(method = 'cv'))$finalModel

Note that depending on the method the reported group means are either around 0.5 or around 1.5, since the first two methods in the question apparently coerce the factor levels to 1 or 2 (numerical).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM