简体   繁体   中英

partykit::ctree randomness in majority=TRUE

I am trying to understand how ctree fits/predict observations with complete missingness in all predictors. For example,

library(partykit)
airq <- subset(airquality, !is.na(Ozone))
airq <- rbind(airq,data.frame(Ozone=rnorm(50),Solar.R=NA,Wind=NA,Temp=NA,Month=NA,Day=NA))
airct <- ctree(Ozone ~ ., data = airq,control = ctree_control(majority = TRUE))
table(tail(predict(airct,type="node"),50))

The last 50 rows of airq are missing all predictors, and from reading the documentation, I get the impressions that with majority=TRUE it will just follow the majority, meaning they should all go into the same node with no variation at all. And yet I get a distribution of prediction for them.

So

  1. is my understanding of how majority=TRUE works correct?
  2. How is ctree fitting/predicting the rows that doesn't have any observed predictors?

by the way, I tried tracing the code to see how the majority argument is used and see that line #104 in partykit:::.cnode has:

prob <- numeric(0) + 1L:length(prob) %in% which.max(prob)

which look rather strange to me as the result will always be numeric(0) .

This is/was a bug in the handling of the majority control argument. It had recently been fixed in the R-Forge repository (see https://R-Forge.R-project.org/R/?group_id=261 ) but not yet been released to CRAN. After running

install.packages("partykit", repos = "http://R-Forge.R-project.org")

everything should work as expected. A date for the CRAN release has not yet been scheduled but should not be in the too distant future, I think.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM