简体   繁体   English

调用 OneHotEncoding 结果变成 Java Heap Space Error

[英]Calling OneHotEncoding results into Java Heap Space Error

When running xGboost Package in H2o throws Java heap space error.H2o中运行xGboost Package 时会抛出 Java 堆空间错误。 But when the memory is cleared manually it works fine.但是当手动清除 memory 时,它工作正常。

I often use del df del something import gc gc.collect()我经常使用 del df del something import gc gc.collect()

in order to clear memory. Any ideas are appreciated.为了清除 memory。任何想法表示赞赏。

import h2o
from h2o.tree import H2OTree
from h2o.estimators import H2OIsolationForestEstimator, H2OXGBoostEstimator,

encoding = "one_hot_explicit"

baseModel = H2OXGBoostEstimator(model_id = modelId, ntrees = 100, 
                                    max_depth = 3,seed = 0xDECAF,
                                    sample_rate = 1,
                                    categorical_encoding = encoding,
                                    keep_cross_validation_predictions=True, 
                                    nfolds = 10
                                    )
    ## TRAIN DATA
    baseModel.train(x = predictor_columns, y = "label", training_frame = train.rbind(valid))

Error Trace:错误跟踪:

Traceback (most recent call last):
      File "/docs/code/000_pyGraph/dec_rf_gb_xgb.py", line 151, in <module>
        decxgb.xgb_cvs(df=df, year=year, model_path=model_path, 
      File "/docs/code/000_pyGraph/dec_xgb.py", line 90, in xgb_cvs
        baseModel.train(x = predictor_columns, y = "label", training_frame = train.rbind(valid))
      File "/home/miniconda3/envs/tf-gpu-mem-day/lib/python3.10/site-packages/h2o/estimators/estimator_base.py", line 123, in train
        self._train(parms, verbose=verbose)
      File "/home/miniconda3/envs/tf-gpu-mem-day/lib/python3.10/site-packages/h2o/estimators/estimator_base.py", line 215, in _train
        job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
      File "/home/miniconda3/envs/tf-gpu-mem-day/lib/python3.10/site-packages/h2o/job.py", line 90, in poll
        raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
    OSError: Job with key $03017f00000132d4ffffffff$_8508e7043b6647f7868aa83a3f6842d4 failed with an exception: DistributedException from /127.0.0.1:54321: 'Java heap space', caused by java.lang.OutOfMemoryError: Java heap space
    stacktrace: 
    DistributedException from /127.0.0.1:54321: 'Java heap space', caused by java.lang.OutOfMemoryError: Java heap space
        at water.MRTask.getResult(MRTask.java:660)
        at water.MRTask.getResult(MRTask.java:670)
        at water.MRTask.doAll(MRTask.java:530)
        at water.MRTask.doAll(MRTask.java:412)
        at water.MRTask.doAll(MRTask.java:397)
        at water.fvec.Vec.doCopy(Vec.java:514)
        at water.fvec.Vec.makeCopy(Vec.java:500)
        at water.fvec.Vec.makeCopy(Vec.java:493)
        at water.fvec.Vec.makeCopy(Vec.java:487)
        at water.util.FrameUtils$CategoricalOneHotEncoder$CategoricalOneHotEncoderDriver.compute2(FrameUtils.java:768)
        at water.H2O$H2OCountedCompleter.compute(H2O.java:1677)
        at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
        at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
        at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
        at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
        at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
    Caused by: java.lang.OutOfMemoryError: Java heap space

XGBoost in H2O actually runs outside the H2O Cluster (Java heap), so when you start up H2O with h2o.init() you need to make sure to save enough RAM to run both H2O and XGBoost. H2O 中的 XGBoost 实际上在 H2O 集群(Java 堆)之外运行,因此当您使用h2o.init()启动 H2O 时,您需要确保保存足够的 RAM 以同时运行 H2O 和 XGBoost。 I think that might be the issue you're having, but please let me know if this does not fix it.我认为这可能是您遇到的问题,但如果这不能解决问题,请告诉我。

We recommend leaving at least 1/3 of the RAM for XGBoost.我们建议至少为 XGBoost 保留 1/3 的 RAM。 So if you have 30GB RAM, give 20GB for H2O ( h2o.init(max_mem_size="30G" ), which leaves 10GB for XGBoost.因此,如果您有 30GB RAM,则为 H2O ( h2o.init(max_mem_size="30G" ) 提供 20GB,从而为 XGBoost 留出 10GB。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM