簡體   English   中英

初始v3再培訓錯誤(花例子)

[英]Inception v3 retraining error (flower example)

我目前正面臨花卉再培訓示例( https://www.tensorflow.org/versions/r0.9/how_tos/image_retraining/index.html )的一個奇怪的錯誤。

Tensorflow Release 0.9是從源代碼安裝的,我嘗試運行image_retraining python腳本(它確實啟動並創建了一些瓶頸,但隨后出現以下錯誤消息)。

可能有人知道問題可能是什么? 我沒有找到任何類似的帖子。

E tensorflow/core/kernels/check_numerics_op.cc:157] abnormal_detected_host @0x10007200300 = {1, 0} activation input is not finite.
Traceback (most recent call last):
  File "examples/image_retraining/retrain.py", line 888, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "examples/image_retraining/retrain.py", line 798, in main
    jpeg_data_tensor, bottleneck_tensor)
  File "examples/image_retraining/retrain.py", line 456, in cache_bottlenecks
    jpeg_data_tensor, bottleneck_tensor)
  File "examples/image_retraining/retrain.py", line 414, in get_or_create_bottleneck
    bottleneck_tensor)
  File "examples/image_retraining/retrain.py", line 331, in run_bottleneck_on_image
    {image_data_tensor: image_data})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 382, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 655, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 723, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 743, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: activation input is not finite. : Tensor had NaN values
         [[Node: conv_1/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="activation input is not finite.", _device="/job:localhost/replica:0/task:0/gpu:0"](conv_1/batchnorm)]]
Caused by op u'conv_1/CheckNumerics', defined at:
  File "examples/image_retraining/retrain.py", line 888, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "examples/image_retraining/retrain.py", line 769, in main
    create_inception_graph())
  File "examples/image_retraining/retrain.py", line 312, in create_inception_graph
    RESIZED_INPUT_TENSOR_NAME]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 274, in import_graph_def
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2297, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1231, in __init__
    self._traceback = _extract_stack()

更新:為了跟進,建議使用Tensorflow 1.6,因為許多操作要快得多。 如果您運行的是Nvidia GPU,請確保安裝CUDA 9.0而不安裝9.1,9.1會破壞所有內容。

對於cuDNN,它需要匹配CUDA 9.0以及構建Tensorflow的版本。 對於Tensorflow 1.6,請務必安裝版本7.0.4,而不是7.1,以及1.6構建的特定版本(否則,它也會中斷):CUDA 9.0的確切版本為cuDNN v7.0.4.31-1(不是9.1)。 最新版本(此時為7.1.2)將拋出錯誤,因為Tensorflow 1.6是使用7.0.4構建的

原帖:這是我遇到的TensorFlow中的一個錯誤(我在Ubuntu 14.04中使用2x GTX 1080)

一種選擇是安裝Cuda 8.0。 但是,Cuda 8.0不完全受支持,您可能會遇到其他問題。

如果您只是試驗,解決此問題的另一種方法是構建它並僅在CPU上運行它,至少在瓶頸生成階段。

bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain
bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir ~/flower_photos

您可能知道,如果您已經構建了支持GPU的TensorFlow,那么運行它:

python tensorflow/examples/image_retraining/retrain.py --image_dir ~/flower_photos

它將在GPU支持下運行,然后你可能會遇到同樣的錯誤。

我在這里打開了一個問題: https//github.com/tensorflow/tensorflow/issues/3560

在他們修復之前,只要您沒有大量要分類的類別,解決方法就可以正常工作。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM