简体   繁体   中英

Abysmally slow performance of caret package, with multicore

I am going through the book " Applied Predictive Modeling " by the author of the caret package.

The first example of a training on a svm takes hours to run on my 64 bit i7 16 GB xubuntu desktop [I gave up after 4 hours]. Since this is a "toy" dataset [800 rows, 42 variables], there sure must be a way to run this in a reasonable amount of time.

library(caret)
data(GermanCredit)

library(doMC)
registerDoMC(8)

GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]
GermanCredit$CheckingAccountStatus.lt.0 <- NULL
GermanCredit$SavingsAccountBonds.lt.100 <- NULL
GermanCredit$EmploymentDuration.lt.1 <- NULL
GermanCredit$EmploymentDuration.Unemployed <- NULL
GermanCredit$Personal.Male.Married.Widowed <- NULL
GermanCredit$Property.Unknown <- NULL
GermanCredit$Housing.ForFree <- NULL

## Split the data into training (80%) and test sets (20%)
set.seed(100)
inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
GermanCreditTrain <- GermanCredit[ inTrain, ]
GermanCreditTest  <- GermanCredit[-inTrain, ]

set.seed(1056)
svmFit = train(Class ~ ., 
           data = GermanCreditTrain,
           method = "svmRadial")

Question: if this code is correct, how can it be run in a reasonable amount of time?

I ran into incredibly poor performance of svmRadial on Linux. It turns out that the issue for me too was with using multicore DoMC . svmRadial runs fine on a single core. The kernlab functions are the only ones in caret that exhibit this behaviour that I've seen. This is very frustrating, as I had to drop multicore for my entire script just to get the SVM functions working.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM