简体   繁体   中英

Text file classification in r from KNN to SVM

my problem is that i don't understand how to go to SVM, actually i have 20% mistakes in KNN so i want to improve this stat, i work on html files that i put in a VCorpus, clean, put in a DTM, figure out the most frequents word and then i use like 1000 files to sort out the good classes for 1 file (i have 7 classes). code below :

corpusEntrainement <- VCorpus(DirSource("training", recursive=T))

corpusCleanEntrainement <- nettoyage(corpusEntrainement)

motsFrequentsEntrainement <- findFreqTerms(corpusMatrice,lowfreq = 400, highfreq = 1200)

corpusDocReduitEntrainement <- DocumentTermMatrix(corpusCleanEntrainement,list(dictionary=motsFrequentsEntrainement))

dataReduitEntrainement <- as.matrix(corpusDocReduitEntrainement[, motsFrequentsEntrainement])

classesEntrainement<-c(rep(1,150),rep(2,150),rep(3,150),rep(4,150),rep(5,150),rep(6,150),rep(7,150))

matriceFinaleEntrainement <- cbind(dataReduitEntrainement,"classes"=classesEntrainement)

So this is how i clean my corpus and get a final as.matrix, how from this i can move from svm ? i think the others part of the code will be simple i just want to move the docs in SVM.

Thanks !

I'm assuming that you're looking for how to train a SVM model (it's not very clear in the question).

library(e1071)

svmfit = svm(classes ~ ., data = matriceFinaleEntrainement)

Note that you may to convert the class as a factor before:

classesEntrainement<-as.factor(c(rep(1,150),rep(2,150),rep(3,150),rep(4,150),rep(5,150),rep(6,150),rep(7,150)))

See for instance this tutorial for details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM