简体   繁体   English

在R中插入符号中调整模型时如何保持参数恒定?

[英]How to keep parameter constant when tuning model in caret in R?

The following code: 如下代码:

require(caret)
require(plyr)

portuguese_scores = read.table("https://raw.githubusercontent.com/JimGorman17/Datasets/master/student-por.csv",sep=";",header=TRUE, stringsAsFactors = FALSE)
portuguese_scores <- portuguese_scores[,!names(portuguese_scores) %in% c("school", "age", "G1", "G2")]
median_score <- summary(portuguese_scores$G3)['Median']
portuguese_scores$score_gte_than_median <- as.factor(median_score<=portuguese_scores$G3)
portuguese_scores <- portuguese_scores[,!names(portuguese_scores) %in% c("G3")]

portuguese_scores$sex <- as.numeric(mapvalues(portuguese_scores$sex, from = c("M", "F"), to = c(0, 1)))
portuguese_scores$address <- as.numeric(mapvalues(portuguese_scores$address, from = c("U", "R"), to = c(0, 1)))
portuguese_scores$famsize <- as.numeric(mapvalues(portuguese_scores$famsize, from = c("LE3", "GT3"), to = c(0, 1)))
portuguese_scores$Pstatus <- as.numeric(mapvalues(portuguese_scores$Pstatus, from = c("T", "A"), to = c(0, 1)))
portuguese_scores$Mjob <- as.numeric(mapvalues(portuguese_scores$Mjob, from = c("at_home","health","other","services","teacher"), to = c(0, 1,2,3,4)))
portuguese_scores$Fjob <- as.numeric(mapvalues(portuguese_scores$Fjob, from = c("at_home","health","other","services","teacher"), to = c(0, 1,2,3,4)))
portuguese_scores$reason <- as.numeric(mapvalues(portuguese_scores$reason, from = c("course","home","other","reputation"), to = c(0, 1,2,3)))
portuguese_scores$guardian <- as.numeric(mapvalues(portuguese_scores$guardian, from = c("father","mother","other"), to = c(0, 1,2)))
portuguese_scores$schoolsup <- as.numeric(mapvalues(portuguese_scores$schoolsup, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$famsup <- as.numeric(mapvalues(portuguese_scores$famsup, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$paid <- as.numeric(mapvalues(portuguese_scores$paid, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$activities <- as.numeric(mapvalues(portuguese_scores$activities, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$nursery <- as.numeric(mapvalues(portuguese_scores$nursery, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$higher <- as.numeric(mapvalues(portuguese_scores$higher, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$internet <- as.numeric(mapvalues(portuguese_scores$internet, from = c("no","yes"), to = c(0, 1)))
portuguese_scores$romantic <- as.numeric(mapvalues(portuguese_scores$romantic, from = c("no","yes"), to = c(0, 1)))

normalize <- function(x){ return( (x - min(x) )/( max(x) - min(x) ) )}
port_n <- data.frame(lapply(portuguese_scores[1:28], normalize), portuguese_scores[29])

set.seed(123)

train_sample <- sample(nrow(port_n), .9 * nrow(port_n))
port_train <- port_n[train_sample,]
port_test <- port_n[-train_sample,]

out1 <- train(port_train[,1:28], port_train[,29], method = "svmRadial")
out1

Generates the following output: 生成以下输出:

Support Vector Machines with Radial Basis Function Kernel 

584 samples
 28 predictor
  2 classes: 'FALSE', 'TRUE' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 584, 584, 584, 584, 584, 584, ... 
Resampling results across tuning parameters:

  C     Accuracy   Kappa      Accuracy SD  Kappa SD  
  0.25  0.7383930  0.4633478  0.02782725   0.05484469
  0.50  0.7382364  0.4637857  0.02883617   0.05763094
  1.00  0.7290191  0.4456935  0.02570423   0.05180727

Tuning parameter 'sigma' was held constant at a value of 0.02166535
Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were sigma = 0.02166535 and C = 0.25. 

My Question: 我的问题:

  • How can I hold c constant (at 0.25) and find the optimal sigma? 如何保持c不变(为0.25)并找到最佳sigma?

UPDATE (to all close voters): 更新(对所有亲密的选民):

  • Do you see how caret kept sigma constant and optimized for C? 您看到插入符号如何使sigma保持恒定并针对C优化吗?
  • I simply want to do the reverse. 我只是想做相反的事情。 I want to keep C constant and optimize for sigma. 我想保持C不变并为sigma优化。
  • And I believe that there are tuning parameters to help me do this, I just don't know the syntax. 而且我相信有调整参数可以帮助我做到这一点,我只是不知道语法。

In order to do this you need to use the tuneGrid argument. 为此,您需要使用tuneGrid参数。 You need to create your own pairs for the parameters and then test them. 您需要为参数创建自己的对,然后对其进行测试。

For example, since you want to test for C=0.25 on all occasions, you need to create a data.frame that looks like this: 例如,由于要在所有情况下测试C = 0.25,因此需要创建一个如下所示的data.frame:

svmGrid <- data.frame(C=rep(0.25,10), sigma=1:10/100)

This has the same value for C (0.25) and different values for sigma to optimize over. C具有相同的值(0.25),而σ具有不同的值以进行优化。 You need to provide these values for sigma yourself (this is only an example - use as many as you want). 您需要自己为sigma提供这些值(这只是一个示例-使用任意数量)。

In other words, according to the above data.frame, your svm model will be tested 10 times. 换句话说,根据上面的data.frame,您的svm模型将被测试10次。 Each time C will be constant and equal to 0.25 and sigma will take values from 0.01 to 0.1 with a step of 0.01. 每次C都是常数且等于0.25时,sigma的取值范围为0.01到0.1,步长为0.01。 10 tests will occur and the best combination will be chosen. 将进行10个测试,并选择最佳组合。

And then you run the model like this: 然后像这样运行模型:

#adding the tuneGrid argument
out1 <- train(port_train[,1:28], port_train[,29], method = "svmRadial", tuneGrid=svmGrid)

Output: 输出:

> out1
Support Vector Machines with Radial Basis Function Kernel 

584 samples
 28 predictor
  2 classes: 'FALSE', 'TRUE' 

No pre-processing
Resampling: Bootstrapped (25 reps) 

Summary of sample sizes: 584, 584, 584, 584, 584, 584, ... 

Resampling results across tuning parameters:

  sigma  Accuracy   Kappa      Accuracy SD  Kappa SD  
  0.01   0.7297315  0.4417768  0.03082764   0.06044173
  0.02   0.7312643  0.4474754  0.03289345   0.06567919
  0.03   0.7301472  0.4468033  0.03618417   0.07187019
  0.04   0.7288286  0.4463212  0.03609275   0.07200966
  0.05   0.7281374  0.4466735  0.03569426   0.07055105
  0.06   0.7238098  0.4400315  0.03348371   0.06666725
  0.07   0.7213752  0.4364012  0.03467845   0.06849882
  0.08   0.7175949  0.4286502  0.04013475   0.08014780
  0.09   0.7042396  0.3981745  0.04346037   0.08864786
  0.10   0.6651296  0.3061489  0.06450228   0.14079631

Tuning parameter 'C' was held constant at a value of 0.25
Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were sigma = 0.02 and C = 0.25. 

And you have your optimized sigma! 并且您有优化的sigma!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM