Ensemble in R using SVM

Question

I'm trying to classify some data using SVM in R.

The data set:

D1 | D2 | D3 | word1 | word2 |...
1  | 2  | 3  | 0     | 1     |
3  | 2  | 1  | 1     | 0     |

D1, D2, D3 take values from 0 to 9 and each word takes a 0/1 value.

First I want to build a classificator that predicts D1 based on word1, word2, etc. Then I want to build a classificator that predicts D2 based on what it predicted in D1 and the words. D1, D2 and D3 used to be a single number of 3 digits and there is a relation between a digit and the prior one.

So far I have:

trainD1 <- train[,-1]
trainD1$D2 <- NULL
trainD1$D3 <- NULL

modelD1 <- svm( train$D1~., trainD1, type="C-classification")

But I'm completely lost, any help is welcome.

Thanks

Answer 1

I'm sure you already know this but I just want to make sure I cover my bases -- if D1 and D2 are predictive of D3 then it will always be better to use the actual values of D1 and D3 rather than predictions of them.

I will assume for the purposes of this question that D1 and D2 may not be present in your prediction data set, so that's why you have to predict them. It may still be more accurate to directly predict D3 from the "word" variables, but that's outside of the scope of this question.

train <- read.csv("trainingSmallExtra.csv")

require(e1071)
d1 <- svm(  x = train[,5:100], # arbitrary subset of words
            y = train$D1,
            gamma = 0.1)

d1.predict <- predict(d1)
train      <- cbind(d1.predict, train)
x_names    <- c("d1.predict", train[,6:101])

d2 <- svm(  x = x_names,  # d1 prediction + arbitrary subset of words
            y = train$D2,
            gamma = 0.1)

d2.predict <- predict(d2)
train      <- cbind(d2.predict, train)

x_names <- c("d1.predict", "d2.predict", colnames(train)[25:150]) 

final <- svm(  x = train[,x_names], 
               y = train$D3,
               gamma = 0.1)

summary(final)

Call: svm.default(x = train[, x_names], y = train$D3, gamma = 0.1)

Parameters: SVM-Type: eps-regression SVM-Kernel: radial
  cost: 1 gamma: 0.1 epsilon: 0.1 
Number of Support Vectors: 932

This is just to show you the process. In your code you will want to use more of the words and set whatever options you think are most appropriate.

I recommend using a holdout sample or cross-validation for benchmarking performance. Compare the ensemble model with a single model that tries to predict D3 directly from the words by examining their performance benchmarks.

Ensemble in R using SVM

Question

1 answers

solution1
2 ACCPTED 2016-07-03 21:31:59

Ensemble in R using SVM

Question

1 answers

solution1 2 ACCPTED 2016-07-03 21:31:59

solution1
2 ACCPTED 2016-07-03 21:31:59