简体   繁体   中英

Calling OneHotEncoding results into Java Heap Space Error

When running xGboost Package in H2o throws Java heap space error. But when the memory is cleared manually it works fine.

I often use del df del something import gc gc.collect()

in order to clear memory. Any ideas are appreciated.

import h2o
from h2o.tree import H2OTree
from h2o.estimators import H2OIsolationForestEstimator, H2OXGBoostEstimator,

encoding = "one_hot_explicit"

baseModel = H2OXGBoostEstimator(model_id = modelId, ntrees = 100, 
                                    max_depth = 3,seed = 0xDECAF,
                                    sample_rate = 1,
                                    categorical_encoding = encoding,
                                    keep_cross_validation_predictions=True, 
                                    nfolds = 10
                                    )
    ## TRAIN DATA
    baseModel.train(x = predictor_columns, y = "label", training_frame = train.rbind(valid))

Error Trace:

Traceback (most recent call last):
      File "/docs/code/000_pyGraph/dec_rf_gb_xgb.py", line 151, in <module>
        decxgb.xgb_cvs(df=df, year=year, model_path=model_path, 
      File "/docs/code/000_pyGraph/dec_xgb.py", line 90, in xgb_cvs
        baseModel.train(x = predictor_columns, y = "label", training_frame = train.rbind(valid))
      File "/home/miniconda3/envs/tf-gpu-mem-day/lib/python3.10/site-packages/h2o/estimators/estimator_base.py", line 123, in train
        self._train(parms, verbose=verbose)
      File "/home/miniconda3/envs/tf-gpu-mem-day/lib/python3.10/site-packages/h2o/estimators/estimator_base.py", line 215, in _train
        job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
      File "/home/miniconda3/envs/tf-gpu-mem-day/lib/python3.10/site-packages/h2o/job.py", line 90, in poll
        raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
    OSError: Job with key $03017f00000132d4ffffffff$_8508e7043b6647f7868aa83a3f6842d4 failed with an exception: DistributedException from /127.0.0.1:54321: 'Java heap space', caused by java.lang.OutOfMemoryError: Java heap space
    stacktrace: 
    DistributedException from /127.0.0.1:54321: 'Java heap space', caused by java.lang.OutOfMemoryError: Java heap space
        at water.MRTask.getResult(MRTask.java:660)
        at water.MRTask.getResult(MRTask.java:670)
        at water.MRTask.doAll(MRTask.java:530)
        at water.MRTask.doAll(MRTask.java:412)
        at water.MRTask.doAll(MRTask.java:397)
        at water.fvec.Vec.doCopy(Vec.java:514)
        at water.fvec.Vec.makeCopy(Vec.java:500)
        at water.fvec.Vec.makeCopy(Vec.java:493)
        at water.fvec.Vec.makeCopy(Vec.java:487)
        at water.util.FrameUtils$CategoricalOneHotEncoder$CategoricalOneHotEncoderDriver.compute2(FrameUtils.java:768)
        at water.H2O$H2OCountedCompleter.compute(H2O.java:1677)
        at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
        at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
        at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
        at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
        at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
    Caused by: java.lang.OutOfMemoryError: Java heap space

XGBoost in H2O actually runs outside the H2O Cluster (Java heap), so when you start up H2O with h2o.init() you need to make sure to save enough RAM to run both H2O and XGBoost. I think that might be the issue you're having, but please let me know if this does not fix it.

We recommend leaving at least 1/3 of the RAM for XGBoost. So if you have 30GB RAM, give 20GB for H2O ( h2o.init(max_mem_size="30G" ), which leaves 10GB for XGBoost.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM