简体   繁体   English

R中的Caret和KNN:预测函数给出错误

[英]Caret and KNN in R: predict function gives error

I try to predict with a simplified KNN model using the caret package in R. It always gives the same error, even in the very simple reproducible example here: 我尝试使用R中的插入符号包通过简化的KNN模型进行预测。即使在这里非常简单的可重现示例中,它始终会给出相同的错误:

library(caret)
set.seed(1)

#generate training dataset "a" 
n = 10000
a = matrix(rnorm(n*8,sd=1000000),nrow = n)
y = round(runif(n))
a = cbind(y,a)
a = as.data.frame(a)
a[,1] = as.factor(a[,1])
colnames(a) = c("y",paste0("V",1:8))

#estimate simple KNN model
ctrl <- trainControl(method="none",repeats = 1)
knnFit <- train(y ~ ., data = a, method = "knn", trControl = ctrl, preProcess = c("center","scale"),  tuneGrid = data.frame(k = 10))

#predict on the training dataset (=useless, but should work)
knnPredict <- predict(knnFit,newdata = a,  type="prob")

This gives 这给

Error in [.data.frame (out, , obsLevels, drop = FALSE) : undefined columns selected [.data.frame错误(输出,obsLevels,drop = FALSE):未定义的列已选中

Defining a more realistic test dataset "b" without the target variable y... 在没有目标变量y的情况下定义更现实的测试数据集“ b”。

#generate test dataset
b =  matrix(rnorm(n*8,sd=1000000),nrow = n) 
b = as.data.frame(b)
colnames(b) = c(paste0("V",1:8))

#predict on the test datase
knnPredict <- predict(knnFit,newdata = b,  type="prob")

gives the same error 给出相同的错误

Error in [.data.frame (out, , obsLevels, drop = FALSE) : undefined columns selected [.data.frame错误(输出,obsLevels,drop = FALSE):未定义的列已选中

I know that the columnames are important, but here they are identical. 我知道专栏很重要,但在这里它们是相同的。 What is wrong here? 怎么了 Thanks! 谢谢!

The problem is your y variable. 问题是您的y变量。 When you are asking for the class probabilities, the train and / or the predict function puts them into a data frame with a column for each class. 当您要求班级概率时,训练和/或预测函数将它们放入一个数据框中,每个班级都有一列。 If the factor levels are not valid variable names, they are automatically changed (eg "0" becomes "X0"). 如果因子级别不是有效的变量名称,则会自动更改它们(例如,“ 0”变为“ X0”)。 See also this post . 另请参阅这篇文章

If you change this line in your code it should work: 如果您在代码中更改此行,则它应该可以工作:

a[,1] = factor(a[,1], labels = c("no", "yes"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM