简体   繁体   中英

Machine learning in R is slow with decisiontree

I'm trying to predict the type of a vehicle (model) based on the vehicle identification number (VIN). The first 10 positions of the VIN says something about the type, so I use them as variables. See an example of the data below:

positie_1_tm_3 positie_4 positie_5 positie_6 positie_7 positie_8 positie_9 positie_10          MODEL
       MBL         B         7         5         L         7         A          6     SKODA YETI
       JNF         A         A         E         1         1         U          2    NISSAN NOTE
       VWZ         Z         Z         5         Z         Z         9          4 VOLKSWAGEN FOX
       F1D         Z         0         V         0         6         4          2 RENAULT MEGANE
       NAK         U         8         1         1         C         A          5    KIA SORENTO
       F1B         R         1         J         0         H         4          1   RENAULT CLIO

I used this R code for it:

#make stratisfied train and test set:
library(caret)
train.index <- createDataPartition(VIN1$MODEL, p = .6, list = FALSE)
train <- VIN1[ train.index,]
overige_data  <- VIN1[-train.index,]
test.index<-createDataPartition(overige_data$MODEL, p = .5, list = FALSE)
test<-overige_data[test.index,]
testset2<-overige_data[-test.index,]

#make decision three :
library(rpart)
library(rpart.plot)  
library(rattle)
library(RColorBrewer)
tree<- rpart(MODEL ~., train, method="class")

But the last one, making the tree, is running for more than 2 weeks already. The dataset is around 3 million rows, so the trainingset is around 1,8 million rows. Is it running so long because it's too much data for rpart or is there another problem?

No, something is obviously wrong. It may take long, but not 2 weeks.

The question - how many labels (classes there are)? Decision trees tend to be slow when the number of classes is large (by large I mean more than 50).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM