简体   繁体   English

R SVM返回NA,用于缺少数据的预测

[英]R SVM return NA for predictions with missing data

I am attempting to make predictions using a trained SVM from package e1071 but my data contains some missing values (NA). 我试图使用来自包e1071的经过训练的SVM进行预测,但我的数据包含一些缺失值(NA)。

I would like the returned predictions to be NA when that instance has any missing values. 当该实例有任何缺失值时,我希望返回的预测为NA。 I tried to use na.action = na.pass as below but it gives me an error "Error in names(ret2) <- rowns : 'names' attribute [150] must be the same length as the vector [149]". 我尝试使用na.action = na.pass如下所示,但它给出了一个错误“名称错误(ret2)< - rowns:'名称'属性[150]必须与向量[149]”的长度相同。

If I use na.omit then I can get predictions without instances with missing data. 如果我使用na.omit,那么我可以获得没有丢失数据的实例的预测。 How can I get predictions including NAs? 我怎样才能获得包括NA在内的预测?

library(e1071)
model <- svm(Species ~ ., data = iris)
print(length(predict(model, iris)))
tmp <- iris
tmp[1, "Sepal.Length"] <- NA
print(length(predict(model, tmp, na.action = na.pass)))

if you are familiar with the caret package , where you can use 233 different types of models to fit (Including SVM from package e1071), in the section called "models clustered by tag similarity" there you can find a csv with the data they used to group the algorithms. 如果您熟悉插入符号包 ,可以使用233种不同类型的模型(包括来自包e1071的SVM),在“通过标记相似性聚类的模型”一节中 ,您可以找到包含他们使用的数据的csv分组算法。

There is a column there called Handle Missing Predictor Data , which tells you which algorithms can do what you want. 有一个名为Handle Missing Predictor Data的列,它告诉您哪些算法可以执行您想要的操作。 Unfortunately SVM is not included there, but these algorithms are: 不幸的是,SVM不包括在内,但这些算法是:

  • Boosted Classification Trees (ada) 提升分类树(ada)
  • Bagged AdaBoost (AdaBag) Bagged AdaBoost(AdaBag)
  • AdaBoost.M1 (AdaBoost.M1) AdaBoost.M1(AdaBoost.M1)
  • C5.0 (C5.0) C5.0(C5.0)
  • Cost-Sensitive C5.0 (C5.0Cost) 对成本敏感的C5.0(C5.0成本)
  • Single C5.0 Ruleset (C5.0Rules) 单个C5.0规则集(C5.0规则)
  • Single C5.0 Tree (C5.0Tree) 单C5.0树(C5.0树)
  • CART (rpart) CART(rpart)
  • CART (rpart1SE) CART(rpart1SE)
  • CART (rpart2) CART(rpart2)
  • Cost-Sensitive CART (rpartCost) 成本敏感的CART(rpartCost)
  • CART or Ordinal Responses (rpartScore) CART或序数响应(rpartScore)

If you still insist on using SVM, you could use the knnImpute option in the preProccess function from the same package, that should allow you to predict for all your observations. 如果您仍然坚持使用SVM,则可以使用同一软件包中preProccess函数中的knnImpute选项,以便您可以预测所有观察结果。

You could just assign all the valid cases back to a prediction variable in the tmp set: 您可以将所有有效案例分配回tmp集中的预测变量:

tmp[complete.cases(tmp), "predict"] <- predict(model, newdata=tmp[complete.cases(tmp),]) 
tmp

#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species    predict
#1             NA         3.5          1.4         0.2     setosa       <NA>
#2            4.9         3.0          1.4         0.2     setosa     setosa
#3            4.7         3.2          1.3         0.2     setosa     setosa
# ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM