简体   繁体   English

std::bad_alloc: out_of_memory: CUDA 导入数据/运行模型时出错

[英]std::bad_alloc: out_of_memory: CUDA error when importing data/running models

I'm trying to upload a dataset to a NVIDA RAPIDS jupyter notebook, but this error keeps popping up when importing this dataset or when using XGBoost on a dask dataframe. The training dataset is 3.7gb in size.我正在尝试将数据集上传到 NVIDA RAPIDS jupyter notebook,但在导入此数据集或在 dask dataframe 上使用 XGBoost 时,此错误不断弹出。训练数据集的大小为 3.7gb。 I only have one GPU.我只有一个GPU。

Some specs:一些规格:

  • CPU: i7 9700F @4.00GHz中央处理器:i7 9700F @4.00GHz
  • GPU: 3070 8GB GDDR6 GPU:3070 8GB GDDR6
  • RAM: 16GB @3600MHz内存:16GB @3600MHz
  • Windows 11 Windows 11
  • Ubuntu 18.04.05 (running the rapids environment) Ubuntu 18.04.05(跑激流环境)
  • Rapids version 22.12急流版本 22.12
  • CUDA version 12.0 CUDA 版本 12.0
  • NVIDIA-SMI version 528.02 NVIDIA-SMI 版本 528.02

I tried using this: https://www.kaggle.com/getting-started/140636 but I think this issue goes deeper我尝试使用这个: https://www.kaggle.com/getting-started/140636但我认为这个问题更深入

import cudf
import dask_cudf
import dask_xgboost
import xgboost as xgb
import tensorflow as tf
import torch
!du -sh one-hot-train.csv
> 3.7G  one-hot-train.csv
!du -sh y-train.csv
> 10M   y-train.csv
# Does not work due to memory issue
X_train = cudf.read_csv('one-hot-train.csv', index_col = 0)
# This will import the data no problem
X_train = dask_cudf.read_csv('one-hot-train.csv', chunksize = "4GB")
X_train = X_train.drop(columns = ['Unnamed: 0'])

# Since the y csv is so small, it doesn't matter how it's imported
y_train = dask_cudf.read_csv('y-train.csv')
y_train = y_train.drop(columns = ['Unnamed: 0'])
xgb_params = {
    
    'learning_rate': 0.3,
    'objective': 'binary:logistic',
    'tree_method': 'gpu_hist',
    'max_depth': 6,
    'seed': 555,
    'predictor': 'gpu_predictor',
    'eval_metric': 'aucpr',
    'n_estimators': 5000,
    
}

# Does not work due to memory issue
xgb_model = dask_xgb.XGBClassifier(**xgb_params)
xgb_model.fit(X_train, y_train)

Here's the specific error:这是具体的错误:

> MemoryError: std::bad_alloc: out_of_memory: CUDA error at: ~/miniconda3/envs/rapids-22.12/include/rmm/mr/device/cuda_memory_resource.hpp

Using Dask XGBoost does not help if you're using a single GPU, as the entire data still needs to fit in memory to train the model. You should either use Dask XGBoost with multiple GPUs or use a single, larger GPU to train this model. XGBoost provides an experimental external memory interface for larger-than-memory dataset training, but it's not ready for production use .如果您使用单个 GPU,则使用 Dask XGBoost 无济于事,因为整个数据仍需要适合 memory 以训练 model。您应该使用具有多个 GPU 的 Dask XGBoost 或使用单个更大的 GPU 来训练这个 model . XGBoost 为大于内存的数据集训练提供了一个实验性的外部 memory 接口, 但它还没有准备好用于生产

Separately, it looks like you're one-hot-encoding your data based on the file name.另外,看起来您正在根据文件名对数据进行单热编码。 You don't need to one-hot-encode your data with recent versions of XGBoost.您无需使用最新版本的 XGBoost 对数据进行单热编码。 See the XGBoost Categorical documentation for more information.有关详细信息,请参阅XGBoost 分类文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 VirtualBox内存异常 - 使用TensorFlow和Docker的std :: bad_alloc - VirtualBox memory exception - std::bad_alloc using TensorFlow and Docker 对 TFServing 的请求因 std::bad_alloc 而失败 - Request to TFServing fails with std::bad_alloc 当我的训练数据很大时,为什么我会在抛出一个 'std::bad_alloc' 实例后调用终止? - Why i am getting "terminate called after throwing an instance of 'std::bad_alloc'' when my training data is large? pybind11 bad_alloc问题与std :: vector <std::shared_ptr<A> &gt; - pybind11 bad_alloc problem with std::vector<std::shared_ptr<A>> MemoryError: std::bad_alloc: Rapids.ai Dask-cuDF - MemoryError: std::bad_alloc: rapids.ai Dask-cuDF Tensorflow MNIST:在抛出&#39;std :: bad_alloc&#39;的实例后终止调用 - Tensorflow MNIST: terminate called after throwing an instance of 'std::bad_alloc' 为什么没有正在运行的进程,但有一个错误提示“cuda out of memory”? - why there is no running process, but there is an error saying" cuda out of memory"? 使用 bad_alloc 调试 Python/C++ 程序 - Debug Python/C++ program with bad_alloc Why do I get CUDA out of memory when running PyTorch model [with enough GPU memory]? - Why do I get CUDA out of memory when running PyTorch model [with enough GPU memory]? 运行python numbapro时Cuda资源不足错误 - Cuda out of resources error when running python numbapro
 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM