R - SVM训练后的奇怪错误/警告（e1071）

Question

I get a strange error after training the e1071 SVM. 训练e1071 SVM后，我收到一个奇怪的错误。 It is a text-document multiclass classification, on a large (10000x1000) sparse matrix (DTM). 它是一个大型（10000x1000）稀疏矩阵（DTM）上的文本文档多类分类。 It seems that something is wrong with the features (columns). 似乎功能（列）出了问题。

The summary(svmModel) works. summary(svmModel)有效。 The results could be better (as always (; ). 结果可能会更好（一如既往（;）。

However, something is wrong and this may be a reason why results are inconsistent. 然而，有些事情是错误的，这可能是结果不一致的原因。

> svmModel <- svm(labels ~., data= train[,-1], cross = 10, seed = 1234, kernel="linear")

Warning message:
In svm.default(x, y, scale = scale, ..., na.action = na.action) :
  Variable(s) ‘abgebildet’ and 
...
‘could’ and  [... truncated]

Answer 1

Check in your training dataset for variables with no values. 在训练数据集中检查没有值的变量。 One way to do this is by taking sum of all the columns. 一种方法是通过获取所有列的总和。

colSums(train[,!colnames(train)=yvar])

If the value is 0 for an independent variable that I can't remove, I usually take a stratified sample as the training dataset. 如果我无法删除的自变量的值为0，我通常会将分层样本作为训练数据集。 It is usually done for a flag variable taking values 0 and 1. 它通常用于取值为0和1的标志变量。

#stratified sampling
library(sampling)
Training<- strata(train, stratanames = "emptyvar", size = c(1000,500))
#this creates a sample of size 1000 and 500 for 0 and 1 each
strata.train<-getdata(train,Training)
#it creates additional 3 columns which you can remove
train<-strata.train[,!colnames(strata.train) %in% c("ID_unit","Prob","Stratum")]

On the other hand you can also add, scale=F to your svm() and scale your variables beforehand. 另一方面，您还可以添加scale=F到您的svm()并预先缩放变量。 This avoids the svm function from scaling your variables which leads to z value being an NaN where variables are empty. 这避免了svm函数缩放变量，导致z值为变量为空的NaN。 However, you'd want to scale your variables which you can do manually. 但是，您需要扩展可以手动执行的变量。

cols<-c(1:5) #say you want to scale the first 5 variables
library(plyr)
standardize <- function(x) as.numeric((x - mean(x)) / sd(x))
train[cols] <- plyr::colwise(standardize)(train[cols])

Answer 2

If there are words which occur rarely then it is not unlikely that the corresponding features in the training data might have only 0's. 如果存在很少出现的单词，那么训练数据中的相应特征可能不仅仅具有0。 I believe that this can cause this warning. 我相信这会引起这种警告。

R - SVM训练后的奇怪错误/警告（e1071）

问题描述

2 个解决方案

解决方案1
3 2017-08-07 09:57:46

解决方案2
2 2014-05-30 17:09:35

R - SVM训练后的奇怪错误/警告（e1071）

问题描述

2 个解决方案

解决方案1 3 2017-08-07 09:57:46

解决方案2 2 2014-05-30 17:09:35

解决方案1
3 2017-08-07 09:57:46

解决方案2
2 2014-05-30 17:09:35