简体   繁体   English

在测试集中R RWeka中缺少值的情况下评估weka分类器J48

[英]Evaluating weka classifier J48 with missing values in test set, R RWeka

I have an error when evaluating a simple test set with evaluate_Weka_classifier. 使用评估_Weka_classifier评估简单测试集时出现错误。 Trying to learn how the interface works from R to Weka with RWeka, but I still don't get this. 试图学习如何使用RWeka从R到Weka界面,但是我仍然不明白。

library("RWeka")
iris_input  <- iris[1:140,]
iris_test <- iris[-(1:140),]
iris_fit  <- J48(Species ~ ., data = iris_input)
evaluate_Weka_classifier(iris_fit, newdata = iris_test, numFolds=5)

No problems here, as we would assume (It is ofcourse a stupit test, no random holdout data etc). 正如我们所假设的那样,这里没有问题(这当然是一个stupit测试,没有随机的保持数据等)。 But now I want to simulate missing data (alot). 但是现在我想模拟丢失的数据(分配)。 So i set Petal.Width as missing: 所以我将Petal.Width设置为丢失:

iris_test$Petal.Width <- NA
evaluate_Weka_classifier(iris_fit, newdata = iris_test, numFolds=5)

Which gives the error: Error in .jcall(evaluation, "S", "toSummaryString", complexity) : java.lang.IllegalArgumentException: Can't have more folds than instances! 这会产生错误:.jcall中的错误(评估,“ S”,“ toSummaryString”,复杂性):java.lang.IllegalArgumentException:折叠不能超过实例!

Edit: This error should tell me that I have not enough instances, but I have 10 编辑:此错误应该告诉我我没有足够的实例,但是我有10个

Edit: If I use write.arff, it can be exported and read in by Weka. 编辑:如果我使用write.arff,它可以被Weka导出和读取。 Change Petal.Width {} into Petal.Width numeric to make the two files exactly the same. 将Petal.Width {}更改为Petal.Width数字,以使两个文件完全相同。 Then it works in Weka. 然后它在Weka中起作用。

Is this a thinking error? 这是思想上的错误吗? When reading Machine Learning, Practical machine learning tools and techniques it seems to be legit. 在阅读机器学习,实用机器学习工具和技术时,这似乎是合法的。 Maybe I just have to tell RWeka that I want to use fractions when a split uses a missing variable? 也许我只需要告诉RWeka当拆分使用丢失的变量时我想使用分数?

Thnx! 日Thnx!

The issue is that you need to tell J48() what to do with missing values. 问题是您需要告诉J48()如何处理缺少的值。

library(RWeka)
?J48()  

#pertinent output  
J48(formula, data, subset, na.action,
control = Weka_control(), options = NULL)

na.action tells R what to do with missing values. na.action告诉R如何处理缺失的值。 When following up on na.action you will find that "The 'factory-fresh' default is na.omit". 跟进na.action时,您会发现“'factory-fresh'默认为na.omit”。 Under this setting of course there are not enough instances! 当然,在这种设置下,实例不足!

Instead of leaving na.action as the default omit, I have changed it as follows, 我没有将na.action保留为默认省略,而是进行了如下更改,

iris_fit<-J48(Species~., data = iris_input, na.action=NULL)

and it works like a charm! 它就像一个魅力!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM