简体   繁体   English

Memory 分配错误调用 XGBoost C function XGBoosterUpdateOneIter 失败:std::bad_alloc

[英]Memory allocation error Call to XGBoost C function XGBoosterUpdateOneIter failed: std::bad_alloc

Working with Julia notebook on Sagemaker: ml.m5d.24xlarge with 500GB memory.在 Sagemaker 上使用 Julia 笔记本: ml.m5d.24xlarge with 500GB memory。

I'm training an XGBoost with 230 features (500MB per file on avg).我正在训练具有 230 个特征的 XGBoost(平均每个文件 500MB)。 It trains without an issue upto 205 files, but afterwards, randomly I get this error它可以毫无问题地训练多达 205 个文件,但之后,随机出现此错误

> ┌ Info: Starting XGBoost training
└   num_boost_rounds = 99
ERROR: LoadError: Call to XGBoost C function XGBoosterUpdateOneIter failed: std::bad_alloc
Stacktrace:
  [1] error(::String, ::String, ::String, ::String)
    @ Base ./error.jl:42
  [2] XGBoosterUpdateOneIter(handle::Ptr{Nothing}, iter::Int32, dtrain::Ptr{Nothing})
    @ XGBoost ~/.julia/packages/XGBoost/fI0vs/src/xgboost_wrapper_h.jl:11
  [3] #update#21
    @ ~/.julia/packages/XGBoost/fI0vs/src/xgboost_lib.jl:204 [inlined]
  [4] xgboost(data::XGBoost.DMatrix, nrounds::Int64; label::Type, param::Vector{Any}, watchlist::Vector{Any}, metrics::Vector{String}, obj::Type, feval::Type, group::Vector{Any}, kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{15, Symbol}, NamedTuple{(:objective, :num_class, :num_parallel_tree, :eta, :gamma, :max_depth, :min_child_weight, :max_delta_step, :subsample, :colsample_bytree, :lambda, :alpha, :tree_method, :grow_policy, :max_leaves), Tuple{String, Int64, Int64, Float64, Float64, Int64, Int64, Int64, Float64, Float64, Int64, Int64, String, String, Int64}}})
    @ XGBoost ~/.julia/packages/XGBoost/fI0vs/src/xgboost_lib.jl:185
  [5] macro expansion
    @ /home/src/Training.jl:175 [inlined]
  [6] macro expansion
    @ ./timing.jl:210 [inlined]

Not sure how to fix it.不知道如何解决它。 The AWS instance has maximum CPU memory. Also, already using 99 procs/workers. AWS 实例的最大 CPU 为 memory。此外,已经使用了 99 个 procs/worker。

This looks like you're trying to allocate more memory than what is available on the machine.这看起来您正在尝试分配比机器上可用的更多的 memory。

Unfortunately not much to do here other than sub-sample your dataset or try a larger instance.不幸的是,除了对数据集进行子采样或尝试更大的实例之外,这里没什么可做的。

An alternative is to try distributed training, using something like Dask: https://xgboost.readthedocs.io/en/stable/tutorials/dask.html另一种方法是尝试分布式训练,使用类似 Dask 的东西: https://xgboost.readthedocs.io/en/stable/tutorials/dask.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在使用 AWS-SDK 抛出 'std::bad_alloc' Linux 实例后调用终止 - terminate called after throwing an instance of 'std::bad_alloc' Linux with AWS-SDK Docker 和节点:致命错误:达到堆限制分配失败 - memory 中的 JavaScript 堆 - Docker and Node: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory 具有 memory 分配管理的云 Function 调度程序 - Cloud Function scheduler with memory allocation management 无法在云中更改 memory 分配 Function - Unable to change memory allocation in Cloud Function 达到堆限制分配失败 - EB / AL2 上的 memory 堆中的 JavaScript - Reached heap limit Allocation failed - JavaScript heap out of memory on EB / AL2 数据库连接失败:错误:ER_BAD_DB_ERROR:未知数据库“database-1” - Database connection failed: Error: ER_BAD_DB_ERROR: Unknown database 'database-1' PySpark GCP 上的 PandasUDF - Memory 分配 - PySpark PandasUDF on GCP - Memory Allocation DialogFlow“Webhook 调用失败。错误:未知。” - DialogFlow "Webhook call failed. Error: UNKNOWN." Webhook 调用失败。 错误:不可用。 对话流 - Webhook call failed. Error: UNAVAILABLE. dialogflow DynamoDB 触发器 Lambda Function 调用失败 - DynamoDB Trigger Lambda Function Call Failed
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM