[英]How to keep parameter constant when tuning model in caret in R?
The following code: 如下代码:
require(caret)
require(plyr)
portuguese_scores = read.table("https://raw.githubusercontent.com/JimGorman17/Datasets/master/student-por.csv",sep=";",header=TRUE, stringsAsFactors = FALSE)
portuguese_scores <- portuguese_scores[,!names(portuguese_scores) %in% c("school", "age", "G1", "G2")]
median_score <- summary(portuguese_scores$G3)['Median']
portuguese_scores$score_gte_than_median <- as.factor(median_score<=portuguese_scores$G3)
portuguese_scores <- portuguese_scores[,!names(portuguese_scores) %in% c("G3")]
portuguese_scores$sex <- as.numeric(mapvalues(portuguese_scores$sex, from = c("M", "F"), to = c(0, 1)))
portuguese_scores$address <- as.numeric(mapvalues(portuguese_scores$address, from = c("U", "R"), to = c(0, 1)))
portuguese_scores$famsize <- as.numeric(mapvalues(portuguese_scores$famsize, from = c("LE3", "GT3"), to = c(0, 1)))
portuguese_scores$Pstatus <- as.numeric(mapvalues(portuguese_scores$Pstatus, from = c("T", "A"), to = c(0, 1)))
portuguese_scores$Mjob <- as.numeric(mapvalues(portuguese_scores$Mjob, from = c("at_home","health","other","services","teacher"), to = c(0, 1,2,3,4)))
portuguese_scores$Fjob <- as.numeric(mapvalues(portuguese_scores$Fjob, from = c("at_home","health","other","services","teacher"), to = c(0, 1,2,3,4)))
portuguese_scores$reason <- as.numeric(mapvalues(portuguese_scores$reason, from = c("course","home","other","reputation"), to = c(0, 1,2,3)))
portuguese_scores$guardian <- as.numeric(mapvalues(portuguese_scores$guardian, from = c("father","mother","other"), to = c(0, 1,2)))
portuguese_scores$schoolsup <- as.numeric(mapvalues(portuguese_scores$schoolsup, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$famsup <- as.numeric(mapvalues(portuguese_scores$famsup, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$paid <- as.numeric(mapvalues(portuguese_scores$paid, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$activities <- as.numeric(mapvalues(portuguese_scores$activities, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$nursery <- as.numeric(mapvalues(portuguese_scores$nursery, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$higher <- as.numeric(mapvalues(portuguese_scores$higher, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$internet <- as.numeric(mapvalues(portuguese_scores$internet, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$romantic <- as.numeric(mapvalues(portuguese_scores$romantic, from = c("no","yes"), to = c(0, 1)))
normalize <- function(x){ return( (x - min(x) )/( max(x) - min(x) ) )}
port_n <- data.frame(lapply(portuguese_scores[1:28], normalize), portuguese_scores[29])
set.seed(123)
train_sample <- sample(nrow(port_n), .9 * nrow(port_n))
port_train <- port_n[train_sample,]
port_test <- port_n[-train_sample,]
out1 <- train(port_train[,1:28], port_train[,29], method = "svmRadial")
out1
Generates the following output: 生成以下输出:
Support Vector Machines with Radial Basis Function Kernel
584 samples
28 predictor
2 classes: 'FALSE', 'TRUE'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 584, 584, 584, 584, 584, 584, ...
Resampling results across tuning parameters:
C Accuracy Kappa Accuracy SD Kappa SD
0.25 0.7383930 0.4633478 0.02782725 0.05484469
0.50 0.7382364 0.4637857 0.02883617 0.05763094
1.00 0.7290191 0.4456935 0.02570423 0.05180727
Tuning parameter 'sigma' was held constant at a value of 0.02166535
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.02166535 and C = 0.25.
My Question: 我的问题:
UPDATE (to all close voters): 更新(对所有亲密的选民):
In order to do this you need to use the tuneGrid
argument. 为此,您需要使用
tuneGrid
参数。 You need to create your own pairs for the parameters and then test them. 您需要为参数创建自己的对,然后对其进行测试。
For example, since you want to test for C=0.25 on all occasions, you need to create a data.frame that looks like this: 例如,由于要在所有情况下测试C = 0.25,因此需要创建一个如下所示的data.frame:
svmGrid <- data.frame(C=rep(0.25,10), sigma=1:10/100)
This has the same value for C (0.25) and different values for sigma to optimize over. C具有相同的值(0.25),而σ具有不同的值以进行优化。 You need to provide these values for sigma yourself (this is only an example - use as many as you want).
您需要自己为sigma提供这些值(这只是一个示例-使用任意数量)。
In other words, according to the above data.frame, your svm model will be tested 10 times. 换句话说,根据上面的data.frame,您的svm模型将被测试10次。 Each time C will be constant and equal to 0.25 and sigma will take values from 0.01 to 0.1 with a step of 0.01.
每次C都是常数且等于0.25时,sigma的取值范围为0.01到0.1,步长为0.01。 10 tests will occur and the best combination will be chosen.
将进行10个测试,并选择最佳组合。
And then you run the model like this: 然后像这样运行模型:
#adding the tuneGrid argument
out1 <- train(port_train[,1:28], port_train[,29], method = "svmRadial", tuneGrid=svmGrid)
Output: 输出:
> out1
Support Vector Machines with Radial Basis Function Kernel
584 samples
28 predictor
2 classes: 'FALSE', 'TRUE'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 584, 584, 584, 584, 584, 584, ...
Resampling results across tuning parameters:
sigma Accuracy Kappa Accuracy SD Kappa SD
0.01 0.7297315 0.4417768 0.03082764 0.06044173
0.02 0.7312643 0.4474754 0.03289345 0.06567919
0.03 0.7301472 0.4468033 0.03618417 0.07187019
0.04 0.7288286 0.4463212 0.03609275 0.07200966
0.05 0.7281374 0.4466735 0.03569426 0.07055105
0.06 0.7238098 0.4400315 0.03348371 0.06666725
0.07 0.7213752 0.4364012 0.03467845 0.06849882
0.08 0.7175949 0.4286502 0.04013475 0.08014780
0.09 0.7042396 0.3981745 0.04346037 0.08864786
0.10 0.6651296 0.3061489 0.06450228 0.14079631
Tuning parameter 'C' was held constant at a value of 0.25
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.02 and C = 0.25.
And you have your optimized sigma! 并且您有优化的sigma!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.