简体   繁体   English

将模型应用于测试数据集以使用 Caret's Train 方法预测 R 中的标签的问题

[英]Problems on applying model to a test data set to predict label in R using Caret's Train method

I have a training data set, let's call it: " training_data ", which consists of 19 variables (features) and 1 label, total of 20 variables (columns).我有一个训练数据集,我们称之为:“ training_data ”,它由 19 个变量(特征)和 1 个标签组成,总共 20 个变量(列)。 This data set only contains the best predictors, meaning that low variance columns or bad predictors have been removed, I mean, this is the resulting data frame from feature selection.这个数据集只包含最好的预测器,这意味着低方差列或坏的预测器已被删除,我的意思是,这是特征选择的结果数据框。 Let's call the label in this data set: " final_score "让我们调用这个数据集中的标签:“ final_score

Also, I have a test data set, lest's call it: " predictions_data ", that has the same 19 variables (features) but has no label variable, so in total, this set is 19 variables (columns).另外,我有一个测试数据集,我们姑且称之为:“ predictions_data ”,它具有相同的 19 个变量(特征)但没有标签变量,因此该集总共有 19 个变量(列)。

I'm doing a very simple regression model, using a "lasso regression" from Caret's " train " method, to train the model and further predict labels (" final_score ") in the " predictions_data ".我在做一个非常简单的回归模型,使用“套索回归”,从插入符号的“ train ”的方法,来训练模型,并进一步预测标签(“ final_score在‘’) predictions_data ”。

My code goes as follows:我的代码如下:

# Import training data as a data frame:

training_data <- data.frame(training_data)


# Set cross validation folds and times:

fitControl <- trainControl(method = "repeatedcv",   

                           number = 3,     # number of folds

                           repeats = 3)    # repeated three times


# Train the model using "lasso" regression from train method. I've called the model as "model.cv":

model.cv <- train(final_score ~ .,

                  data = training_data,

                  method = "lasso",

                  trControl = fitControl,

                  preProcess = c('scale', 'center')) 

So far, everything goes nice, the model shows the best results from cross validation and the metrics (RMSE, MAE, etc.) obtained.到目前为止,一切顺利,模型显示了交叉验证的最佳结果和获得的指标(RMSE、MAE 等)。

So now, I want to apply the model to the " predictions_data ", so the model can "predict" the final_score .所以现在,我想将模型应用于“ predictions_data ”,这样模型就可以“预测” final_score

My code for trying to do this, is:我尝试这样做的代码是:

# Import test data set to a data frame (with no label column):

predictions_data <- data.frame(predictions_data)

# Apply the model using predict function from Caret, and save them in an object called: "predictions":

predictions <- predict(model.cv, newdata = predictions_data)

And here comes the problem.问题来了。 Even I stated that newdata = predictions_data , the predictions object returns the predicted labels for the training data set and not the test data set... What am I doing wrong?即使我说newdata = predictions_data ,预测对象返回训练数据集的预测标签,而不是测试数据集......我做错了什么? (well obviously this is a very basic model, but event though it should work with predictions, right?) (显然这是一个非常基本的模型,但事件虽然它应该与预测一起工作,对吗?)

Thanks in advance!提前致谢!

The test dataset had some data in incorrect format (ie NA's in numeric columns) as opposed to the training dataset that was cleaned/prepared for training.测试数据集有一些格式不正确的数据(即数字列中的 NA),而不是为训练而清理/准备的训练数据集。 As soon as the test data was cleaned/prepared the predict function executed correctly.一旦测试数据被清理/准备好,预测功能就会正确执行。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 训练、验证、测试在 R 中的 CARET 中拆分 model - train,validation, test split model in CARET in R 使用 R 中的插入符将数据拆分 100 次随机训练和测试数据 - Split 100 times randomly train and test data using caret in R 在 R 中重新运行 preProcess()、predict() 和 train() 时的 model 精度不同(插入符号) - Different model accuracy when rerunning preProcess(), predict() and train() in R (caret) 在`r`的`插入符号&#39;包中训练测试分裂 - Train test split in `r`'s `caret` package 如何使用R中的机器学习和Caret包在新数据集上测试调整后的SVM模型? - How to test your tuned SVM model on a new data-set using machine learning and Caret Package in R? 如何使用 R 中的 tidymodels 调整后的 model 预测测试集的置信区间? - How to predict the test set's confidence interval using a tuned model from tidymodels in R? 来自 caret R 包中训练/测试集的 ROC 曲线 - ROC curve from train/test set in caret R package 插入符号 - 使用train(),predict()和resamples()的不同结果 - Caret - different results using train(), predict() and resamples() 使用插入符号在R中进行分类的预测(模型)和预测(模型$ finalModel)之间的差异 - Difference between predict(model) and predict(model$finalModel) using caret for classification in R 在使用公式使用插入符号的 train() 训练的 randomForest object 上使用 predict() 时出错 - Error when using predict() on a randomForest object trained with caret's train() using formula
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM