在测试集中R RWeka中缺少值的情况下评估weka分类器J48

Question

I have an error when evaluating a simple test set with evaluate_Weka_classifier. 使用评估_Weka_classifier评估简单测试集时出现错误。 Trying to learn how the interface works from R to Weka with RWeka, but I still don't get this. 试图学习如何使用RWeka从R到Weka界面，但是我仍然不明白。

library("RWeka")
iris_input  <- iris[1:140,]
iris_test <- iris[-(1:140),]
iris_fit  <- J48(Species ~ ., data = iris_input)
evaluate_Weka_classifier(iris_fit, newdata = iris_test, numFolds=5)

No problems here, as we would assume (It is ofcourse a stupit test, no random holdout data etc). 正如我们所假设的那样，这里没有问题（这当然是一个stupit测试，没有随机的保持数据等）。 But now I want to simulate missing data (alot). 但是现在我想模拟丢失的数据（分配）。 So i set Petal.Width as missing: 所以我将Petal.Width设置为丢失：

iris_test$Petal.Width <- NA
evaluate_Weka_classifier(iris_fit, newdata = iris_test, numFolds=5)

Which gives the error: Error in .jcall(evaluation, "S", "toSummaryString", complexity) : java.lang.IllegalArgumentException: Can't have more folds than instances! 这会产生错误：.jcall中的错误（评估，“ S”，“ toSummaryString”，复杂性）：java.lang.IllegalArgumentException：折叠不能超过实例！

Edit: This error should tell me that I have not enough instances, but I have 10 编辑：此错误应该告诉我我没有足够的实例，但是我有10个

Edit: If I use write.arff, it can be exported and read in by Weka. 编辑：如果我使用write.arff，它可以被Weka导出和读取。 Change Petal.Width {} into Petal.Width numeric to make the two files exactly the same. 将Petal.Width {}更改为Petal.Width数字，以使两个文件完全相同。 Then it works in Weka. 然后它在Weka中起作用。

Is this a thinking error? 这是思想上的错误吗？ When reading Machine Learning, Practical machine learning tools and techniques it seems to be legit. 在阅读机器学习，实用机器学习工具和技术时，这似乎是合法的。 Maybe I just have to tell RWeka that I want to use fractions when a split uses a missing variable? 也许我只需要告诉RWeka当拆分使用丢失的变量时我想使用分数？

Thnx! 日Thnx！

Answer 1

The issue is that you need to tell J48() what to do with missing values. 问题是您需要告诉J48（）如何处理缺少的值。

library(RWeka)
?J48()  

#pertinent output  
J48(formula, data, subset, na.action,
control = Weka_control(), options = NULL)

na.action tells R what to do with missing values. na.action告诉R如何处理缺失的值。 When following up on na.action you will find that "The 'factory-fresh' default is na.omit". 跟进na.action时，您会发现“'factory-fresh'默认为na.omit”。 Under this setting of course there are not enough instances! 当然，在这种设置下，实例不足！

Instead of leaving na.action as the default omit, I have changed it as follows, 我没有将na.action保留为默认省略，而是进行了如下更改，

iris_fit<-J48(Species~., data = iris_input, na.action=NULL)

and it works like a charm! 它就像一个魅力！

在测试集中R RWeka中缺少值的情况下评估weka分类器J48

问题描述

1 个解决方案

解决方案1
4 已采纳 2013-11-18 17:12:59

在测试集中R RWeka中缺少值的情况下评估weka分类器J48

问题描述

1 个解决方案

解决方案1 4 已采纳 2013-11-18 17:12:59

解决方案1
4 已采纳 2013-11-18 17:12:59