将模型应用于测试数据集以使用 Caret's Train 方法预测 R 中的标签的问题

Question

I have a training data set, let's call it: " training_data ", which consists of 19 variables (features) and 1 label, total of 20 variables (columns).我有一个训练数据集，我们称之为：“ training_data ”，它由 19 个变量（特征）和 1 个标签组成，总共 20 个变量（列）。 This data set only contains the best predictors, meaning that low variance columns or bad predictors have been removed, I mean, this is the resulting data frame from feature selection.这个数据集只包含最好的预测器，这意味着低方差列或坏的预测器已被删除，我的意思是，这是特征选择的结果数据框。 Let's call the label in this data set: " final_score "让我们调用这个数据集中的标签：“ final_score ”

Also, I have a test data set, lest's call it: " predictions_data ", that has the same 19 variables (features) but has no label variable, so in total, this set is 19 variables (columns).另外，我有一个测试数据集，我们姑且称之为：“ predictions_data ”，它具有相同的 19 个变量（特征）但没有标签变量，因此该集总共有 19 个变量（列）。

I'm doing a very simple regression model, using a "lasso regression" from Caret's " train " method, to train the model and further predict labels (" final_score ") in the " predictions_data ".我在做一个非常简单的回归模型，使用“套索回归”，从插入符号的“ train ”的方法，来训练模型，并进一步预测标签（“ final_score在‘’） predictions_data ”。

My code goes as follows:我的代码如下：

# Import training data as a data frame:

training_data <- data.frame(training_data)


# Set cross validation folds and times:

fitControl <- trainControl(method = "repeatedcv",   

                           number = 3,     # number of folds

                           repeats = 3)    # repeated three times


# Train the model using "lasso" regression from train method. I've called the model as "model.cv":

model.cv <- train(final_score ~ .,

                  data = training_data,

                  method = "lasso",

                  trControl = fitControl,

                  preProcess = c('scale', 'center'))

So far, everything goes nice, the model shows the best results from cross validation and the metrics (RMSE, MAE, etc.) obtained.到目前为止，一切顺利，模型显示了交叉验证的最佳结果和获得的指标（RMSE、MAE 等）。

So now, I want to apply the model to the " predictions_data ", so the model can "predict" the final_score .所以现在，我想将模型应用于“ predictions_data ”，这样模型就可以“预测” final_score 。

My code for trying to do this, is:我尝试这样做的代码是：

# Import test data set to a data frame (with no label column):

predictions_data <- data.frame(predictions_data)

# Apply the model using predict function from Caret, and save them in an object called: "predictions":

predictions <- predict(model.cv, newdata = predictions_data)

And here comes the problem.问题来了。 Even I stated that newdata = predictions_data , the predictions object returns the predicted labels for the training data set and not the test data set... What am I doing wrong?即使我说newdata = predictions_data ，预测对象返回训练数据集的预测标签，而不是测试数据集......我做错了什么？ (well obviously this is a very basic model, but event though it should work with predictions, right?) （显然这是一个非常基本的模型，但事件虽然它应该与预测一起工作，对吗？）

Thanks in advance!提前致谢！

Answer 1

The test dataset had some data in incorrect format (ie NA's in numeric columns) as opposed to the training dataset that was cleaned/prepared for training.测试数据集有一些格式不正确的数据（即数字列中的 NA），而不是为训练而清理/准备的训练数据集。 As soon as the test data was cleaned/prepared the predict function executed correctly.一旦测试数据被清理/准备好，预测功能就会正确执行。

将模型应用于测试数据集以使用 Caret's Train 方法预测 R 中的标签的问题

问题描述

1 个解决方案

解决方案1
1 2019-11-23 23:10:20

将模型应用于测试数据集以使用 Caret&#39;s Train 方法预测 R 中的标签的问题

问题描述

1 个解决方案

解决方案1 1 2019-11-23 23:10:20

将模型应用于测试数据集以使用 Caret's Train 方法预测 R 中的标签的问题

解决方案1
1 2019-11-23 23:10:20