[英]Problems on applying model to a test data set to predict label in R using Caret's Train method
I have a training data set, let's call it: " training_data
", which consists of 19 variables (features) and 1 label, total of 20 variables (columns).我有一个训练数据集,我们称之为:“ training_data
”,它由 19 个变量(特征)和 1 个标签组成,总共 20 个变量(列)。 This data set only contains the best predictors, meaning that low variance columns or bad predictors have been removed, I mean, this is the resulting data frame from feature selection.这个数据集只包含最好的预测器,这意味着低方差列或坏的预测器已被删除,我的意思是,这是特征选择的结果数据框。 Let's call the label in this data set: " final_score
"让我们调用这个数据集中的标签:“ final_score
”
Also, I have a test data set, lest's call it: " predictions_data
", that has the same 19 variables (features) but has no label variable, so in total, this set is 19 variables (columns).另外,我有一个测试数据集,我们姑且称之为:“ predictions_data
”,它具有相同的 19 个变量(特征)但没有标签变量,因此该集总共有 19 个变量(列)。
I'm doing a very simple regression model, using a "lasso regression" from Caret's " train
" method, to train the model and further predict labels (" final_score
") in the " predictions_data
".我在做一个非常简单的回归模型,使用“套索回归”,从插入符号的“ train
”的方法,来训练模型,并进一步预测标签(“ final_score
在‘’) predictions_data
”。
My code goes as follows:我的代码如下:
# Import training data as a data frame:
training_data <- data.frame(training_data)
# Set cross validation folds and times:
fitControl <- trainControl(method = "repeatedcv",
number = 3, # number of folds
repeats = 3) # repeated three times
# Train the model using "lasso" regression from train method. I've called the model as "model.cv":
model.cv <- train(final_score ~ .,
data = training_data,
method = "lasso",
trControl = fitControl,
preProcess = c('scale', 'center'))
So far, everything goes nice, the model shows the best results from cross validation and the metrics (RMSE, MAE, etc.) obtained.到目前为止,一切顺利,模型显示了交叉验证的最佳结果和获得的指标(RMSE、MAE 等)。
So now, I want to apply the model to the " predictions_data
", so the model can "predict" the final_score
.所以现在,我想将模型应用于“ predictions_data
”,这样模型就可以“预测” final_score
。
My code for trying to do this, is:我尝试这样做的代码是:
# Import test data set to a data frame (with no label column):
predictions_data <- data.frame(predictions_data)
# Apply the model using predict function from Caret, and save them in an object called: "predictions":
predictions <- predict(model.cv, newdata = predictions_data)
And here comes the problem.问题来了。 Even I stated that newdata = predictions_data
, the predictions object returns the predicted labels for the training data set and not the test data set... What am I doing wrong?即使我说newdata = predictions_data
,预测对象返回训练数据集的预测标签,而不是测试数据集......我做错了什么? (well obviously this is a very basic model, but event though it should work with predictions, right?) (显然这是一个非常基本的模型,但事件虽然它应该与预测一起工作,对吗?)
Thanks in advance!提前致谢!
The test dataset had some data in incorrect format (ie NA's in numeric columns) as opposed to the training dataset that was cleaned/prepared for training.测试数据集有一些格式不正确的数据(即数字列中的 NA),而不是为训练而清理/准备的训练数据集。 As soon as the test data was cleaned/prepared the predict function executed correctly.一旦测试数据被清理/准备好,预测功能就会正确执行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.