简体   繁体   中英

Memory allocation error Call to XGBoost C function XGBoosterUpdateOneIter failed: std::bad_alloc

Working with Julia notebook on Sagemaker: ml.m5d.24xlarge with 500GB memory.

I'm training an XGBoost with 230 features (500MB per file on avg). It trains without an issue upto 205 files, but afterwards, randomly I get this error

> ┌ Info: Starting XGBoost training
└   num_boost_rounds = 99
ERROR: LoadError: Call to XGBoost C function XGBoosterUpdateOneIter failed: std::bad_alloc
Stacktrace:
  [1] error(::String, ::String, ::String, ::String)
    @ Base ./error.jl:42
  [2] XGBoosterUpdateOneIter(handle::Ptr{Nothing}, iter::Int32, dtrain::Ptr{Nothing})
    @ XGBoost ~/.julia/packages/XGBoost/fI0vs/src/xgboost_wrapper_h.jl:11
  [3] #update#21
    @ ~/.julia/packages/XGBoost/fI0vs/src/xgboost_lib.jl:204 [inlined]
  [4] xgboost(data::XGBoost.DMatrix, nrounds::Int64; label::Type, param::Vector{Any}, watchlist::Vector{Any}, metrics::Vector{String}, obj::Type, feval::Type, group::Vector{Any}, kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{15, Symbol}, NamedTuple{(:objective, :num_class, :num_parallel_tree, :eta, :gamma, :max_depth, :min_child_weight, :max_delta_step, :subsample, :colsample_bytree, :lambda, :alpha, :tree_method, :grow_policy, :max_leaves), Tuple{String, Int64, Int64, Float64, Float64, Int64, Int64, Int64, Float64, Float64, Int64, Int64, String, String, Int64}}})
    @ XGBoost ~/.julia/packages/XGBoost/fI0vs/src/xgboost_lib.jl:185
  [5] macro expansion
    @ /home/src/Training.jl:175 [inlined]
  [6] macro expansion
    @ ./timing.jl:210 [inlined]

Not sure how to fix it. The AWS instance has maximum CPU memory. Also, already using 99 procs/workers.

This looks like you're trying to allocate more memory than what is available on the machine.

Unfortunately not much to do here other than sub-sample your dataset or try a larger instance.

An alternative is to try distributed training, using something like Dask: https://xgboost.readthedocs.io/en/stable/tutorials/dask.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM