简体   繁体   中英

Hyperparameter tuning one-class SVM

I am looking for a package or a 'best practice' approach to automated hyper-parameter selection for one-class SVM using Gaussian(RBF) kernel. I am currently implementing libsvm's one-class svm in R so preferably an approach incorporating that or, at least, R would be best.


EDIT

Just to give a more clear example of what I am looking for, let's say we have the iris dataset and we take one of the types as the positive cases. One approach is to build a one-class SVM with differing choices of nu and gamma and then to validate the accuracy of the model against the negative cases (the other flower types). See below:

library(datasets)
library(data.table)
library(e1071)
#load the iris data
data(iris)
#separate positive and negative cases
positive_cases <- iris[iris$Species=="virginica",1:4]
negative_cases <- iris[iris$Species!="virginica",1:4]
#get hyperparameter choices
hyp_param_choices <- setDT(expand.grid("nu"=seq(.1,.3,by=.1),
                                       "gamma"=1*10^seq(-2, 2, by=1)))
hyp_param_choices[,err:=0]

for(hyp_i in 1L:nrow(hyp_param_choices)){
  tuned <- svm(x=positive_cases, 
               y=rep(T,nrow(positive_cases)), #True as they are all in the positive class
               nu =  hyp_param_choices[hyp_i,nu],
               gamma = hyp_param_choices[hyp_i,gamma],
               type='one-classification',
               scale=T #scale the data
  )
  svm_neg_pred <- predict(tuned, #predict the negative classes, should all be false
                          negative_cases)
  #error is sum of svm_neg_pred as this counts all the positives .i.e false positive cases divided by total number of negatives
  set(hyp_param_choices, i=hyp_i, j="err", value=(sum(svm_neg_pred)/nrow(negative_cases)))
}
setorder(hyp_param_choices,err)
print(hyp_param_choices)
     nu gamma  err
 1: 0.1 1e+00 0.00
 2: 0.2 1e+00 0.00
 3: 0.3 1e+00 0.00
 4: 0.1 1e+01 0.00
 5: 0.2 1e+01 0.00
 6: 0.3 1e+01 0.00
 7: 0.1 1e+02 0.00
 8: 0.2 1e+02 0.00
 9: 0.3 1e+02 0.00
10: 0.3 1e-02 0.01
11: 0.2 1e-01 0.01
12: 0.2 1e-02 0.02
13: 0.3 1e-01 0.02
14: 0.1 1e-01 0.03
15: 0.1 1e-02 0.05

Now in truth, my problem has some false positives in the training data. We could incorporate that in the example about by adding a sample of the negatives into the positives and excluding these negatives from the validation testing and then rerunning:

positive_cases <- rbind(iris[iris$Species=="virginica",1:4],
                        iris[iris$Species!="virginica",1:4][sample(nrow(iris[iris$Species!="virginica",]), 
                                                                   10),])

I am looking for another approach to choosing the best one-class hyperparameters in a paper or otherwise that has some reasoning for being a good approach.


To give some background, I am aware of the original implementation of one-class SVMs by Scholkopf et al. and understand the aim of the approach is to map the one-class data to the feature space corresponding to the kernel and to separate them from the origin with the maximum margin using the hyperplane. In this sense the origin can be thought of as all other classes. I am also aware of SVDD's introduced by Tax & Duin . The aim here is create the smallest possible data enclosing sphere. Through this approach all points outside the sphere are other classes/outliers. I also know that these two approaches deduce to equivalent minimisation functions when using a Gaussian kernel. Both these approaches use soft-margins allowing for misclassified cases in the one-class too. As they are equivalent I will only talk about OC-SVMs but approaches using SVDDs as an answer would also be greatly appreciated!

So in my problem, my one-class are positive cases for which I want to optimise nu relating to the proportion of misclassified cases (false positives) and gamma, the width of the gaussian kernel. I n my problem, I know there will be false positives, it is the nature of the problem and can not be detected. I also want to apply multiple OC-SVMs on different datasets so I need an automated approach to tuning nu and gamma based on the proportion of outliers present in the dataset in question and the data's latent features.

As this problem is essentially unsupervised, I obviously can't use CV in the normal manner with a range of nu and gamma as then the solution with the minimum distance from the origin would be chosen. Just to note I do have negative cases but would rather hold them back if possible from a validation step as if not why bother with a one-class approach at all and why not use a normal two-class classification approach?

My question is whether anyone has found a package or approach to do this in R? I know there are plenty of approaches in scientific literature including to very promising approaches: DTL and here but these don't seem to have code available barring pseudocode and how to translate this to R and incorporate it with libsvm, for example, seems a big step for my current abilities.

Any help or suggestions at all would be greatly appreciated!

Your question is about svm implementation. Here, I include a sketch for svm below RBF context. The implementation in this post uses caret and the method is taken from kernlab package. Next an example using iris dataset with Species multinomial. I sketched the training side but the test side can be easily done using predict() over the test set and confusion matrices from same caret or multiclass auroc.

The method also considers cross validation with cv=10 :

#Some libraries
library(rsample)
library(caret)
library(visdat)
library(recipes)
#Data
data(iris)
# Create training (70%) and test (30%) sets
set.seed(123)
split_strat <- initial_split(iris, prop = 0.7,
                             strata = 'Species')
train_strat <- training(split_strat)
test_strat <- testing(split_strat)

#Tuning a SVM model

# Tune an SVM with radial basis kernel
set.seed(1854) # for reproducibility
model_svm <- caret::train(
  Species ~ .,
  data = train_strat,
  method = "svmRadial",
  trControl = trainControl(method = "cv", number = 10),
  tuneLength = 10
)

# Plot results
ggplot(model_svm) + theme_light()

在此处输入图像描述

You can go deeper and research about the method looking for kernlab which includes more options for tuning parameters that can be added to caret framework. I hope this can be useful for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM