简体   繁体   中英

H2O Stacked Ensemble Prediction ArrayIndexOutOfBoundsException

Using the h2o package for R, I created a set of base models using AutoML with StackedEnsemble's disabled. Thus, the set of models only contains the base models that AutoML generates by default (GLM, GBM, XGBoost, DeepLearning, and DRF). Using these base models I was able to successfully train a default stacked ensemble manually using the h2o.stackedEnsemble function (ie, a GLM with default params). I exported the model as a MOJO, shutdown the H2O cluster, restarted R, initialized a new H2O cluster, imported the stacked ensemble MOJO, and successfully generated predictions on a new validation set.

So far so good.

Next, I did the exact same thing following the exact same process, but this time I made one change: I trained the stacked ensemble with all pairwise interactions between the base models . The interactions were created automatically by feeding a list of the base model Ids to the interaction metalearner_parameter. The model appeared to train without issue and (as I described above) was able to export it as a MOJO, restart the h2o cluster, restart R, and import the MOJO. However, when I attempt to generate predictions on the same validation set I used above I get the following error:

DistributedException from localhost/127.0.0.1:54321: 'null', caused by java.lang.ArrayIndexOutOfBoundsException

DistributedException from localhost/127.0.0.1:54321: 'null', caused by java.lang.ArrayIndexOutOfBoundsException
    at water.MRTask.getResult(MRTask.java:660)
    at water.MRTask.getResult(MRTask.java:670)
    at water.MRTask.doAll(MRTask.java:530)
    at water.MRTask.doAll(MRTask.java:549)
    at hex.Model.predictScoreImpl(Model.java:2057)
    at hex.generic.GenericModel.predictScoreImpl(GenericModel.java:127)
    at hex.Model.score(Model.java:1896)
    at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:491)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1658)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.ArrayIndexOutOfBoundsException

Error: DistributedException from localhost/127.0.0.1:54321: 'null', caused by java.lang.ArrayIndexOutOfBoundsException

When I exported the stacked ensemble with interactions as a MOJO I also exported it as a binary. When I instead import the binary for the stacked ensemble with interactions it is able to generate predictions on the validation set without error.

I'm running R 4.1.2 and this was all done using h2o_3.36.0.1

Does anyone have any suggestions for resolving this issue?

EDIT (more info): The datasets used to train and validate the model all contain continuous predictors and targets, so I do not believe this is related to one-hot encoding as might be the case for others getting this error.

Unfortunately, H2O-3 doesn't currently support exporting GLM with interactions as MOJO. There's a bug that allows the GLM to be exported with interactions but the MOJO doesn't work correctly - the interactions are replaced by missing values. This should be fixed in the next release (3.36.0.2) - it will not allow to export that MOJO in the first place.

There's not much other than writing the stacked ensemble in R (base model predictions preprocessing (eg, interaction creation) and then feeding it to the h2o.glm) that you can do. There is now an unmaintained package h2oEnsemble that might be helpful for that. You can also use another metalearner model that is more flexible, eg, GBM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM