I am trying to understand how ctree fits/predict observations with complete missingness in all predictors. For example,
library(partykit)
airq <- subset(airquality, !is.na(Ozone))
airq <- rbind(airq,data.frame(Ozone=rnorm(50),Solar.R=NA,Wind=NA,Temp=NA,Month=NA,Day=NA))
airct <- ctree(Ozone ~ ., data = airq,control = ctree_control(majority = TRUE))
table(tail(predict(airct,type="node"),50))
The last 50 rows of airq
are missing all predictors, and from reading the documentation, I get the impressions that with majority=TRUE
it will just follow the majority, meaning they should all go into the same node with no variation at all. And yet I get a distribution of prediction for them.
So
majority=TRUE
works correct? by the way, I tried tracing the code to see how the majority
argument is used and see that line #104
in partykit:::.cnode
has:
prob <- numeric(0) + 1L:length(prob) %in% which.max(prob)
which look rather strange to me as the result will always be numeric(0)
.
This is/was a bug in the handling of the majority
control argument. It had recently been fixed in the R-Forge repository (see https://R-Forge.R-project.org/R/?group_id=261 ) but not yet been released to CRAN. After running
install.packages("partykit", repos = "http://R-Forge.R-project.org")
everything should work as expected. A date for the CRAN release has not yet been scheduled but should not be in the too distant future, I think.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.