簡體   English   中英

當將圖像與Estimator API r1.0一起使用時,我得到CUDA_ERROR_OUT_OF_MEMORY

[英]I get a CUDA_ERROR_OUT_OF_MEMORY when using images with Estimator API r1.0

我正在嘗試使用tf.contrib.learn.estimator的Estimator API來構建,擬合和評估CNN圖像分類器。 我下面的代碼基於創建估算器教程中的abalone.py。 另外,我將從cifar10教程中導入代碼,以提供模型和輸入提要。 代碼如下:

import tensorflow as tf
import cifar10

def model_fn(features, targets, mode, params):
# Generate predictions from cifar10 network
logits = cifar10.inference(features)
prediction_dict = {"classes" : logits}

# Loss operation
loss = tf.losses.softmax_cross_entropy(targets, logits, scope='loss')

# Metrics for evaluation
eval_metric_ops = {
    "accuracy"  :   tf.metrics.accuracy(targets, logits, name='accuracy'),
    "precision" :   tf.metrics.precision(targets, logits, name='precision')
}

# Training operation
train_op = tf.contrib.layers.optimize_loss(
    loss=loss,
    global_step=tf.contrib.framework.get_global_step(),
    learning_rate=params["learning_rate"],
    optimizer="SGD")

return tf.contrib.learn.ModelFnOps(
    mode=mode,
    predictions=prediction_dict,
    loss=loss,
    train_op=train_op,
    eval_metric_ops=eval_metric_ops
)
def input_fn():
    features, labels = cifar10.distorted_inputs()
    return features, tf.one_hot(labels, 10)

def eval_input_fn():
    return cifar10.inputs(eval_data=True)

def main(args=None):
    # Set model params
    model_params = {"learning_rate": 0.1}
#Create and fit estimator
nn = tf.contrib.learn.Estimator(model_fn=model_fn, params=model_params)
nn.fit(input_fn=input_fn, steps=5000)

ev = nn.evaluate(input_fn=eval_input_fn(), steps=1)
print("Loss: %s" % ev["loss"])
print("Accuracy: %s" % ev["accuracy"])
print("Precision: %s" % ev["precision"])

if __name__ == '__main__':
  tf.app.run()

我收到的錯誤消息如下:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 7.92G (8507555840 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 7.13G (7656800256 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 6.42G (6891120128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.78G (6202008064 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.20G (5581807104 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

錯誤消息繼續遞減內存大小,並以以下三行結尾:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 

將以下內容添加到您的導入中

from tensorflow.contrib.learn.python.learn import run_config

在主要方法中:

def main(args):
    # Set model params
    model_params = {"learning_rate": 0.1}
    # Create a RunConfig instance
    r_config = run_config.RunConfig(gpu_memory_fraction=0.6)
    #Create and fit estimator
    nn = tf.contrib.learn.Estimator(model_fn=model_fn, params=model_params, config=r_config)
    nn.fit(input_fn=input_fn, steps=5000)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM