[英]While training deep learning model using tensorflow library i am getting error: ResourceExhaustedError OOM on gpu(128 gb RAM) Kindly help me
C:\\Users\\CVL-Acoustics\\Documents\\bangla-sentence-correction-master>python train.py Sit back and relax, it will take some time to train the model... Vocabulary size 250000 WARNING:tensorflow:From C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\ops\\rnn.py:417: calling reverse_sequence (from tensorflow.python.ops.array_ops) with seq_dim is deprecated and will be removed in a future version. C:\\Users\\CVL-Acoustics\\Documents\\bangla-sentence-correction-master>python train.py 高枕无忧,训练模型需要一些时间...词汇量250000 WARNING:tensorflow:From C: \\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\ops\\rnn.py:417:使用 seq_dim 调用 reverse_sequence(来自 tensorflow.python.ops.array_ops)已被弃用,将来会被删除版本。 Instructions for updating: seq_dim is deprecated, use seq_axis instead WARNING:tensorflow:From C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\util\\deprecation.py:432: calling reverse_sequence (from tensorflow.python.ops.array_ops) with batch_dim is deprecated and will be removed in a future version.更新说明:seq_dim 已弃用,使用 seq_axis 代替 WARNING:tensorflow:From C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\util\\deprecation.py:432: call reverse_sequence(来自.python.ops.array_ops) 与 batch_dim 已弃用,并将在未来版本中删除。 Instructions for updating: batch_dim is deprecated, use batch_axis instead WARNING:tensorflow:From train.py:228: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.更新说明:batch_dim 已弃用,使用 batch_axis 代替Instructions for updating:更新说明:
Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. TensorFlow 的未来主要版本将默认允许梯度流入反向传播的标签输入。
See @{tf.nn.softmax_cross_entropy_with_logits_v2}.见@{tf.nn.softmax_cross_entropy_with_logits_v2}。
epoch 1 training Traceback (most recent call last): File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py", line 1322, in _do_call return fn(*args) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[6656,250000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape, Variable_1/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current epoch 1 training Traceback(最近一次调用最后一次):文件“C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py”,第 1322 行,在 _do_call return fn(* args) 文件“C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py”,第 1307 行,在 _run_fn 选项、feed_dict、fetch_list、target_list、run_metadata 中)文件“C :\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM 分配形状张量时[6656,250000] 并通过分配器 GPU_0_bfc 在 /job:localhost/replica:0/task:0/device:GPU:0 上键入 float [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape, Variable_1/read)]] 提示:如果你想在 OOM 发生时查看已分配张量的列表,请添加 report_tensor_allocations_upon_oom为当前的 RunOptions allocation info.分配信息。
[[Node: rnn/while/cond/Add/_87 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_421_rnn/while/cond/Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_clooprnn/while/cond/ArgMax/dimension/_1)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.提示:如果您想在发生 OOM 时查看已分配张量的列表,请将 report_tensor_allocations_upon_oom 添加到 RunOptions 以获取当前分配信息。
During handling of the above exception, another exception occurred:在处理上述异常的过程中,又发生了一个异常:
Traceback (most recent call last): File "train.py", line 321, in _, l = sess.run([train_op, loss], fd) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py", line 900, in run run_metadata_ptr) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py", line 1316, in _do_run run_metadata) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[6656,250000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU回溯(最近一次调用):文件“train.py”,第 321 行,在 _, l = sess.run([train_op, loss], fd) 文件“C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\ site-packages\\tensorflow\\python\\client\\session.py", line 900, in run_metadata_ptr) 文件 "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py “,第 1135 行,在 _run feed_dict_tensor,选项,run_metadata 中)文件“C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py”,第 1316 行,在 _do_run run_metadata)文件“C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\client\\session.py”,第 1335 行,在 _do_call raise type(e)(node_def, op, message) tensorflow.python .framework.errors_impl.ResourceExhaustedError:在分配形状为 [6656,250000] 的张量时出现 OOM,并在 /job:localhost/replica:0/task:0/device:GPU:0 上通过分配器 GPU_0_bfc [[Node: MatMul = MatMul] 键入 float [T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU :0"](Reshape, Variable_1/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. :0"](Reshape, Variable_1/read)]] 提示:如果您想在 OOM 发生时查看已分配张量的列表,请将 report_tensor_allocations_upon_oom 添加到 RunOptions 以获取当前分配信息。
[[Node: rnn/while/cond/Add/_87 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_421_rnn/while/cond/Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_clooprnn/while/cond/ArgMax/dimension/_1)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.提示:如果您想在 OOM 发生时查看已分配张量的列表,请将 report_tensor_allocations_upon_oom 添加到 RunOptions 以获取当前分配信息。
Caused by op 'MatMul', defined at: File "train.py", line 218, in decoder_logits_flat = tf.add(tf.matmul(decoder_outputs_flat, W), b) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\ops\\math_ops.py", line 2014, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\ops\\gen_math_ops.py", line 4278, in mat_mul name=name) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\framework\\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\framework\\ops.py", line 3414, in create_op op_def=op_def) File "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\framework\\ops.py", line 1740, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access由操作“MatMul”引起,定义在:文件“train.py”,第 218 行,在decoder_logits_flat = tf.add(tf.matmul(decoder_outputs_flat, W), b) 文件“C:\\Users\\CVL-Acoustics\\Anaconda3 \\lib\\site-packages\\tensorflow\\python\\ops\\math_ops.py",第 2014 行,在 matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) 文件 "C:\\Users\\CVL-Acoustics\\ Anaconda3\\lib\\site-packages\\tensorflow\\python\\ops\\gen_math_ops.py", line 4278, in mat_mul name=name) 文件 "C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python \\framework\\op_def_library.py”,第 787 行,在 _apply_op_helper op_def=op_def)文件“C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\framework\\ops.py”,第 3414 行,在 create_op op_def=op_def) 文件“C:\\Users\\CVL-Acoustics\\Anaconda3\\lib\\site-packages\\tensorflow\\python\\framework\\ops.py”,第 1740 行,在init self._traceback = self._graph._extract_stack () # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[6656,250000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape, Variable_1/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. ResourceExhaustedError(回溯见上文):在分配形状为 [6656,250000] 的张量时出现 OOM,并在 /job:localhost/replica:0/task:0/device:GPU:0 上通过分配器 GPU_0_bfc [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape, Variable_1/read)]] 提示:如果你想要在 OOM 发生时查看已分配张量的列表,请将 report_tensor_allocations_upon_oom 添加到 RunOptions 以获取当前分配信息。
[[Node: rnn/while/cond/Add/_87 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_421_rnn/while/cond/Add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_clooprnn/while/cond/ArgMax/dimension/_1)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.提示:如果您想在发生 OOM 时查看已分配张量的列表,请将 report_tensor_allocations_upon_oom 添加到 RunOptions 以获取当前分配信息。
There are several reasons why this would happen.发生这种情况的原因有多种。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.