[英]How to test your tuned SVM model on a new data-set using machine learning and Caret Package in R?
Guys! 伙计们!
I am a newbie in machine learning methods and have a question about it. 我是机器学习方法的新手,对此有疑问。 I try to use Caret package in R to start this method and work with my dataset.
我尝试在R中使用Caret包来启动此方法并使用我的数据集。
I have a training dataset (Dataset1) with mutation information regarding my gene of interest let's say Gene A . 我有一个训练数据集(Dataset1),其中包含有关我感兴趣的基因的突变信息,比如说Gene A。
In Dataset1 , I have the information regarding the mutation of Gene A in the form of Mut or Not-Mut . 在Dataset1中 ,我获得了有关Mut或Not-Mut形式的基因A突变的信息。 I used the Dataset1 with SVM model to predict the output (I chose SVM because it was more accurate than LVQ or GBM).
我将Dataset1与SVM模型一起使用来预测输出(我选择SVM是因为它比LVQ或GBM更准确)。 So, in my first step, I divided my dataset into training and test groups because I've had information as a test and train set in the dataset.
因此,在第一步中,我将数据集分为训练和测试组,因为我在数据集中拥有作为测试和训练集的信息。 then I've done the cross validation with 10 fold.
然后我完成了10折交叉验证。 I tuned my model and assessed the performance of the model using the test dataset (using ROC curve).
我调整了模型,并使用测试数据集(使用ROC曲线)评估了模型的性能。 Everything goes fine till this step.
一切顺利,直到这一步。
I have another dataset. 我有另一个数据集。 Dataset2 which doesn't have mutation information regarding Gene A .
Dataset2不具备有关的基因突变信息。 What I want to do now is to use my tuned SVM model from the Dataset1 on the Dataset2 to see if it could give me mutation information regarding Gene A in the Dataset 2 in a form of Mut/Not-Mut .
我想现在要做的就是用我的调整SVM模型从dataSet1的上Dataset2,看它是否能够给我就在MUT /不-MUT的一种形式的数据集2 基因的突变信息。 I've gone through Caret package guide but I couldn't get it.
我已经查看了Caret软件包指南,但无法理解 。 I am stuck here and don't know what to do.
我被困在这里,不知道该怎么办。
I am not sure if I chose a right approach.Any suggestions or help would really be appreciated. 我不确定是否选择了正确的方法,任何建议或帮助都将不胜感激。
Here is my code till I tuned my model from the first dataset. 这是我的代码,直到我从第一个数据集调整了模型。
Selecting training and test models from the first dataset: 从第一个数据集中选择训练和测试模型:
M_train <- Dataset1[Dataset1$Case=='train',-1] #creating train feature data frame
M_test <- Dataset1[Dataset1$Case=='test',-1] #creating test feature data frame
y=as.factor(M_train$Class) # Target variable for training
ctrl <- trainControl(method="repeatedcv", # 10fold cross validation
repeats=5, # do 5 repititions of cv
summaryFunction=twoClassSummary, # Use AUC to pick the best model
classProbs=TRUE)
#Use the expand.grid to specify the search space
#Note that the default search grid selects 3 values of each tuning parameter
grid <- expand.grid(interaction.depth = seq(1,4,by=2), #tree depths from 1 to 4
n.trees=seq(10,100,by=10), # let iterations go from 10 to 100
shrinkage=c(0.01,0.1), # Try 2 values fornlearning rate
n.minobsinnode = 20)
# Set up for parallel processing
#set.seed(1951)
registerDoParallel(4,cores=2)
#Train and Tune the SVM
svm.tune <- train(x=M_train,
y= M_train$Class,
method = "svmRadial",
tuneLength = 9, # 9 values of the cost function
preProc = c("center","scale"),
metric="ROC",
trControl=ctrl) # same as for gbm above
#Finally, assess the performance of the model using the test data set.
#Make predictions on the test data with the SVM Model
svm.pred <- predict(svm.tune,M_test)
confusionMatrix(svm.pred,M_test$Class)
svm.probs <- predict(svm.tune,M_test,type="prob") # Gen probs for ROC
svm.ROC <- roc(predictor=svm.probs$mut,
response=as.factor(M_test$Class),
levels=y))
plot(svm.ROC,main="ROC for SVM built with GA selected features")
So, here is where I stuck, how can I use svm.tune model to predict the mutation of Gene A in Dataset2 ? 所以,这里是我卡住了,我该如何使用svm.tune模型来预测基因A的Dataset2突变?
Thanks in advance, 提前致谢,
Now you just take the model you built and tuned and predict off of it using predict
: 现在,您只需采用已构建和调整的模型,并使用
predict
对其进行predict
:
D2.predictions <- predict(svm.tune, newdata = Dataset2)
They keys are to be sure that you have ALL off the same predictor variables in this set, with the same column names (and in my paranoid world in the same order). 它们的键确保您在该集中具有相同列名称(并且在我的偏执世界中以相同顺序)的所有相同预测变量均已关闭。
D2.predictions
will contain your predicted classes for the unlabeled data. D2.predictions
将包含未标记数据的预测类。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.