std::bad_alloc: out_of_memory: CUDA 导入数据/运行模型时出错

Question

我正在尝试将数据集上传到 NVIDA RAPIDS jupyter notebook，但在导入此数据集或在 dask dataframe 上使用 XGBoost 时，此错误不断弹出。训练数据集的大小为 3.7gb。 我只有一个GPU。

一些规格：

中央处理器：i7 9700F @4.00GHz
GPU：3070 8GB GDDR6
内存：16GB @3600MHz
Windows 11
Ubuntu 18.04.05（跑激流环境）
急流版本 22.12
CUDA 版本 12.0
NVIDIA-SMI 版本 528.02

我尝试使用这个： https://www.kaggle.com/getting-started/140636但我认为这个问题更深入

import cudf
import dask_cudf
import dask_xgboost
import xgboost as xgb
import tensorflow as tf
import torch

!du -sh one-hot-train.csv
> 3.7G  one-hot-train.csv

!du -sh y-train.csv
> 10M   y-train.csv

# Does not work due to memory issue
X_train = cudf.read_csv('one-hot-train.csv', index_col = 0)

# This will import the data no problem
X_train = dask_cudf.read_csv('one-hot-train.csv', chunksize = "4GB")
X_train = X_train.drop(columns = ['Unnamed: 0'])

# Since the y csv is so small, it doesn't matter how it's imported
y_train = dask_cudf.read_csv('y-train.csv')
y_train = y_train.drop(columns = ['Unnamed: 0'])

xgb_params = {
    
    'learning_rate': 0.3,
    'objective': 'binary:logistic',
    'tree_method': 'gpu_hist',
    'max_depth': 6,
    'seed': 555,
    'predictor': 'gpu_predictor',
    'eval_metric': 'aucpr',
    'n_estimators': 5000,
    
}

# Does not work due to memory issue
xgb_model = dask_xgb.XGBClassifier(**xgb_params)
xgb_model.fit(X_train, y_train)

这是具体的错误：

> MemoryError: std::bad_alloc: out_of_memory: CUDA error at: ~/miniconda3/envs/rapids-22.12/include/rmm/mr/device/cuda_memory_resource.hpp

Answer 1

如果您使用单个 GPU，则使用 Dask XGBoost 无济于事，因为整个数据仍需要适合 memory 以训练 model。您应该使用具有多个 GPU 的 Dask XGBoost 或使用单个更大的 GPU 来训练这个 model . XGBoost 为大于内存的数据集训练提供了一个实验性的外部 memory 接口，但它还没有准备好用于生产。

另外，看起来您正在根据文件名对数据进行单热编码。 您无需使用最新版本的 XGBoost 对数据进行单热编码。 有关详细信息，请参阅XGBoost 分类文档。

std::bad_alloc: out_of_memory: CUDA 导入数据/运行模型时出错

问题描述

1 个解决方案

解决方案1
1 2023-01-24 14:36:31

std::bad_alloc: out_of_memory: CUDA 导入数据/运行模型时出错

问题描述

1 个解决方案

解决方案1 1 2023-01-24 14:36:31

解决方案1
1 2023-01-24 14:36:31