简体   繁体   English

如何使用R中的机器学习和Caret包在新数据集上测试调整后的SVM模型?

[英]How to test your tuned SVM model on a new data-set using machine learning and Caret Package in R?

Guys! 伙计们!

I am a newbie in machine learning methods and have a question about it. 我是机器学习方法的新手,对此有疑问。 I try to use Caret package in R to start this method and work with my dataset. 我尝试在R中使用Caret包来启动此方法并使用我的数据集。

I have a training dataset (Dataset1) with mutation information regarding my gene of interest let's say Gene A . 我有一个训练数据集(Dataset1),其中包含有关我感兴趣的基因的突变信息,比如说Gene A。

In Dataset1 , I have the information regarding the mutation of Gene A in the form of Mut or Not-Mut . Dataset1中 ,我获得了有关MutNot-Mut形式的基因A突变的信息。 I used the Dataset1 with SVM model to predict the output (I chose SVM because it was more accurate than LVQ or GBM). 我将Dataset1SVM模型一起使用来预测输出(我选择SVM是因为它比LVQ或GBM更准确)。 So, in my first step, I divided my dataset into training and test groups because I've had information as a test and train set in the dataset. 因此,在第一步中,我将数据集分为训练和测试组,因为我在数据集中拥有作为测试和训练集的信息。 then I've done the cross validation with 10 fold. 然后我完成了10折交叉验证。 I tuned my model and assessed the performance of the model using the test dataset (using ROC curve). 我调整了模型,并使用测试数据集(使用ROC曲线)评估了模型的性能。 Everything goes fine till this step. 一切顺利,直到这一步。

I have another dataset. 我有另一个数据集。 Dataset2 which doesn't have mutation information regarding Gene A . Dataset2不具备有关的基因突变信息。 What I want to do now is to use my tuned SVM model from the Dataset1 on the Dataset2 to see if it could give me mutation information regarding Gene A in the Dataset 2 in a form of Mut/Not-Mut . 我想现在要做的就是用我的调整SVM模型dataSet1的Dataset2,看它是否能够给我就在MUT /不-MUT的一种形式的数据集2 基因的突变信息。 I've gone through Caret package guide but I couldn't get it. 我已经查看了Caret软件包指南,但无法理解 I am stuck here and don't know what to do. 我被困在这里,不知道该怎么办。

I am not sure if I chose a right approach.Any suggestions or help would really be appreciated. 我不确定是否选择了正确的方法,任何建议或帮助都将不胜感激。

Here is my code till I tuned my model from the first dataset. 这是我的代码,直到我从第一个数据集调整了模型。

Selecting training and test models from the first dataset: 从第一个数据集中选择训练和测试模型:

M_train <- Dataset1[Dataset1$Case=='train',-1] #creating train feature data frame

M_test <- Dataset1[Dataset1$Case=='test',-1] #creating test feature data frame

y=as.factor(M_train$Class) # Target variable for training


ctrl <- trainControl(method="repeatedcv", # 10fold cross validation
                     repeats=5, # do 5 repititions of cv
                     summaryFunction=twoClassSummary, # Use AUC to pick the best model
                     classProbs=TRUE)


#Use the expand.grid to specify the search space 
#Note that the default search grid selects 3 values of each tuning parameter

grid <- expand.grid(interaction.depth = seq(1,4,by=2), #tree depths from 1 to 4
                    n.trees=seq(10,100,by=10), # let iterations go from 10 to 100
                    shrinkage=c(0.01,0.1), # Try 2 values fornlearning rate 
                    n.minobsinnode = 20)


# Set up for parallel processing
#set.seed(1951)
registerDoParallel(4,cores=2)


#Train and Tune the SVM
svm.tune <- train(x=M_train,
                  y= M_train$Class,
                  method = "svmRadial",
                  tuneLength = 9, # 9 values of the cost function
                  preProc = c("center","scale"),
                  metric="ROC",
                  trControl=ctrl) # same as for gbm above

#Finally, assess the performance of the model using the test data set.

#Make predictions on the test data with the SVM Model
svm.pred <- predict(svm.tune,M_test)

confusionMatrix(svm.pred,M_test$Class)

svm.probs <- predict(svm.tune,M_test,type="prob") # Gen probs for ROC

svm.ROC <- roc(predictor=svm.probs$mut,
               response=as.factor(M_test$Class),
               levels=y))

plot(svm.ROC,main="ROC for SVM built with GA selected features")

So, here is where I stuck, how can I use svm.tune model to predict the mutation of Gene A in Dataset2 ? 所以,这里是我卡住了,我该如何使用svm.tune模型来预测基因ADataset2突变?

Thanks in advance, 提前致谢,

Now you just take the model you built and tuned and predict off of it using predict : 现在,您只需采用已构建和调整的模型,并使用predict对其进行predict

D2.predictions <- predict(svm.tune, newdata = Dataset2)

They keys are to be sure that you have ALL off the same predictor variables in this set, with the same column names (and in my paranoid world in the same order). 它们的键确保您在该集中具有相同列名称(并且在我的偏执世界中以相同顺序)的所有相同预测变量均已关闭。

D2.predictions will contain your predicted classes for the unlabeled data. D2.predictions将包含未标记数据的预测类。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 R 中的 tidymodels 调整后的 model 预测测试集的置信区间? - How to predict the test set's confidence interval using a tuned model from tidymodels in R? 在火车上使用您自己的模型(插入包)? - Using your own model in train (caret package)? 如何使用 r 中的 caret 包在最佳调整超参数的 10 倍交叉验证中获得每个折叠的预测? - How to get predictions for each fold in 10-fold cross-validation of the best tuned hyperparameters using caret package in r? 将模型应用于测试数据集以使用 Caret&#39;s Train 方法预测 R 中的标签的问题 - Problems on applying model to a test data set to predict label in R using Caret's Train method R机器学习模型-盲测 - R Machine Learning Model - Blind Test 使用Caret CreateTimeSlices通过机器学习模型增加窗口预测 - Using Caret CreateTimeSlices for Growing window prediction with Machine Learning Model 如何获得随机森林 model 中每个交叉验证折叠中训练和测试集的预测,由 R 的 Caret package 生成 - How to obtain the prediction for training and test set in each cross validation fold in random forest model generated by Caret package of R 如何使用 r 编程语言处理数据集列中包含的空值? - How to treat with empty values contained in columns of data-set using r programming language? 如何使用 Python REST API 执行 R 机器学习模型? - How to Execute R Machine Learning Model using Python REST API? (Caret) 包中机器学习模型的特征重要性 - Feature Importance for machine learning models in (Caret)package
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM