简体   繁体   English

rtx 2070s 无法从设备分配 GPU 内存:CUDA_ERROR_OUT_OF_MEMORY:内存不足

[英]rtx 2070s failed to allocate gpu memory from device:CUDA_ERROR_OUT_OF_MEMORY: out of memory

tf 2.0.0-gpu CUDA 10.0 RTX2070super tf 2.0.0-gpu CUDA 10.0 RTX2070super

hi.你好。 i got a problem regarding allocating gmemory.我在分配 gmemory 时遇到了问题。 The initial allocation of memory is 7GB like this.这样初始分配的内存是7GB。

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6994 MB memory)创建 TensorFlow 设备(/job:localhost/replica:0/task:0/device:GPU:0,6994 MB 内存)

2020-01-11 22:19:22.983048: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-01-11 22:19:23.786225: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 2.78G (2989634304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2020-01-11 22:19:24.159338: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2020-01-11 22:19:22.983048: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] 成功打开动态库 libcudnn.so.7 2020-01-11 22:19:23.7862225/stream_executorflow_exe /cuda/cuda_driver.cc:830] 无法从设备分配 2.78G(2989634304 字节):CUDA_ERROR_OUT_OF_MEMORY:内存不足 2020-01-11 22:19:24.159338:我 tensorflow/platform/stream_default/executor/ 44] 成功打开动态库 libcublas.so.10.0

Limit: 7333884724 InUse: 5888382720 MaxInUse: 6255411968 NumAllocs: 1264 MaxAllocSize: 2372141056限制:7333884724 InUse:5888382720 MaxInUse:6255411968 NumAllocs:1264 MaxAllocSize:2372141056

but i can only use 5900MB memory and the rest of memory always fails to be allocated.但我只能使用 5900MB 内存,其余的内存总是无法分配。

i guess that if whole gpu memory is used in rtx 2070s, i use 2 types data typse(float16, float32).我想如果在 rtx 2070s 中使用整个 GPU 内存,我会使用 2 种类型的数据类型(float16、float32)。 so i got a policy by using this codes所以我通过使用这个代码得到了一个政策

opt = tf.keras.optimizers.Adam(1e-4) opt = tf.keras.optimizers.Adam(1e-4)

opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt) opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)

Still, the allocation always fails.尽管如此,分配总是失败。

Tensorflow memory management can be frustrating. Tensorflow 内存管理可能令人沮丧。

Main takeaway: whenever you see OOM there is actually not enough memory and you either have to reduce your model size or batch size.主要内容:每当您看到 OOM 时,实际上内存不足,您必须减少模型大小或批量大小。 TF would throw OOM when it tries to allocate sufficient memory, regardless of how much memory has been allocated before.无论之前分配了多少内存,TF 在尝试分配足够的内存时都会抛出 OOM。

On the start, TF would try to allocate a reasonably large chunk of memory which would be equivalent to about 90-98% of the whole memory available - 5900MB in your case.一开始,TF 会尝试分配相当大的内存块,这相当于整个可用内存的 90-98% - 在您的情况下为 5900MB。 Then, when actual data starts to take more than that, TF would additionally try to allocate sufficient amount of memory or a bit more - 2.78G.然后,当实际数据开始占用更多时,TF 会另外尝试分配足够数量的内存或更多 - 2.78G。 And if that does not fit it would throw OOM, like in your case.如果这不合适,它会抛出 OOM,就像你的情况一样。 Your GPU could not fit 5.9+2.8Gb.您的 GPU 无法容纳 5.9+2.8Gb。 The last chunk of 2.78G might actually be a little more than TF needs, but it would anyhow be used later if you have multiple training steps because maximum required memory can fluctuate a bit between identical Session.run's.最后一块 2.78G 实际上可能比 TF 需要的多一点,但是如果您有多个训练步骤,无论如何都会在以后使用它,因为所需的最大内存可能会在相同的 Session.run 之间略有波动。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Tensorflow GPU 错误 CUDA_ERROR_OUT_OF_MEMORY:内存不足 - Tensorflow GPU error CUDA_ERROR_OUT_OF_MEMORY: out of memory CUDA_ERROR_OUT_OF_MEMORY张量流 - CUDA_ERROR_OUT_OF_MEMORY tensorflow CUDA_ERROR_OUT_OF_MEMORY:超出 memory:对于 tensorflow 2.1 - CUDA_ERROR_OUT_OF_MEMORY: out of memory: For tensorflow 2.1 numba 和 tensorflow 一起给出了 CUDA_ERROR_OUT_OF_MEMORY - numba and tensorflow together gives CUDA_ERROR_OUT_OF_MEMORY Tensorflow GPU 2.0 在 NVIDIA RTX Z52F9EC21735243AD92Z17CDA 卡上抛出 memory - Tensorflow GPU 2.0 is throwing out of memory on NVIDIA RTX GPU card CUDA 错误:内存不足 - Python 进程使用了​​所有 GPU 内存 - CUDA Error: out of memory - Python process utilizes all GPU memory 当将图像与Estimator API r1.0一起使用时,我得到CUDA_ERROR_OUT_OF_MEMORY - I get a CUDA_ERROR_OUT_OF_MEMORY when using images with Estimator API r1.0 Tensorflow#object_detection/train.py 上的 CUDA_ERROR_OUT_OF_MEMORY - CUDA_ERROR_OUT_OF_MEMORY on Tensorflow#object_detection/train.py Tensorflow object 检测 API: CUDA_ERROR_OUT_OF_MEMORY on Google Colab - Tensorflow object detection API: CUDA_ERROR_OUT_OF_MEMORY on Google Colab 迁移学习 - 尝试从 memory 中重新训练 RTX 2070 上的高效网络 B07 - transfer learning - trying to retrain efficientnet-B07 on RTX 2070 out of memory
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM