[英]I get a CUDA_ERROR_OUT_OF_MEMORY when using images with Estimator API r1.0
我正在嘗試使用tf.contrib.learn.estimator
的Estimator API來構建,擬合和評估CNN圖像分類器。 我下面的代碼基於創建估算器教程中的abalone.py。 另外,我將從cifar10教程中導入代碼,以提供模型和輸入提要。 代碼如下:
import tensorflow as tf
import cifar10
def model_fn(features, targets, mode, params):
# Generate predictions from cifar10 network
logits = cifar10.inference(features)
prediction_dict = {"classes" : logits}
# Loss operation
loss = tf.losses.softmax_cross_entropy(targets, logits, scope='loss')
# Metrics for evaluation
eval_metric_ops = {
"accuracy" : tf.metrics.accuracy(targets, logits, name='accuracy'),
"precision" : tf.metrics.precision(targets, logits, name='precision')
}
# Training operation
train_op = tf.contrib.layers.optimize_loss(
loss=loss,
global_step=tf.contrib.framework.get_global_step(),
learning_rate=params["learning_rate"],
optimizer="SGD")
return tf.contrib.learn.ModelFnOps(
mode=mode,
predictions=prediction_dict,
loss=loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops
)
def input_fn():
features, labels = cifar10.distorted_inputs()
return features, tf.one_hot(labels, 10)
def eval_input_fn():
return cifar10.inputs(eval_data=True)
def main(args=None):
# Set model params
model_params = {"learning_rate": 0.1}
#Create and fit estimator
nn = tf.contrib.learn.Estimator(model_fn=model_fn, params=model_params)
nn.fit(input_fn=input_fn, steps=5000)
ev = nn.evaluate(input_fn=eval_input_fn(), steps=1)
print("Loss: %s" % ev["loss"])
print("Accuracy: %s" % ev["accuracy"])
print("Precision: %s" % ev["precision"])
if __name__ == '__main__':
tf.app.run()
我收到的錯誤消息如下:
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 7.92G (8507555840 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 7.13G (7656800256 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 6.42G (6891120128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.78G (6202008064 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.20G (5581807104 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
錯誤消息繼續遞減內存大小,並以以下三行結尾:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
將以下內容添加到您的導入中
from tensorflow.contrib.learn.python.learn import run_config
在主要方法中:
def main(args):
# Set model params
model_params = {"learning_rate": 0.1}
# Create a RunConfig instance
r_config = run_config.RunConfig(gpu_memory_fraction=0.6)
#Create and fit estimator
nn = tf.contrib.learn.Estimator(model_fn=model_fn, params=model_params, config=r_config)
nn.fit(input_fn=input_fn, steps=5000)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.