简体   繁体   中英

Training SSD gives ValueError: Can't load save_path when it is None

I am using google colab for training my ssd model. This is the stack trace of my error:

Traceback (most recent call last):
  File "train_ssd_network.py", line 394, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train_ssd_network.py", line 390, in main
    sync_optimizer=None)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 753, in train
    master, start_standard_services=False, config=session_config) as sess:
  File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 1014, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 839, in stop
    ignore_live_threads=ignore_live_threads)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python3.7/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 1003, in managed_session
    start_standard_services=start_standard_services)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 734, in prepare_or_wait_for_session
    init_fn=self._init_fn)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/session_manager.py", line 298, in prepare_session
    init_fn(sess)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/framework/python/ops/variables.py", line 761, in callback
    saver.restore(session, model_path)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 1277, in restore
    raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 325, in run
    raise  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))  File "train_ssd_network.py", line 390, in main
    sync_optimizer=None)  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 796, in train
    should_retry = True  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
==================================
E1002 15:10:16.652289 140269098841984 tf_should_use.py:76] ==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 325, in run
    raise  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))  File "train_ssd_network.py", line 390, in main
    sync_optimizer=None)  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 796, in train
    should_retry = True  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
==================================

I understand that there is an issue with the train_ssd.network.py file, but what is the exact issue here?

Here is an image for the checkpoints: 在此处输入图像描述

I read StackOverflow questions where they mentioned that this could be a checkpoint-related issue. However, I do have a checkpoint folder which has the ssd_300_vgg.ckpt file unzipped which further contains two files. This file is downloaded from the author's repository directly.

Other answers state as follows:

The error just means tf.train.latest_checkpoin t didn't find anything. It returns None , then the Saver complains because it was passed None . So there's no checkpoint in that directory.

tf.app.flags.DEFINE_string(
    'checkpoint_path', '/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt',
    'The path to a checkpoint from which to fine-tune.')
tf.app.flags.DEFINE_string(
    'checkpoint_model_scope', None,
    'Model scope in the checkpoint. None if the same as the trained model.')
tf.app.flags.DEFINE_string(
    'checkpoint_exclude_scopes', None,
    'Comma-separated list of scopes of variables to exclude when restoring '
    'from a checkpoint.')
tf.app.flags.DEFINE_string(
    'trainable_scopes', None,
    'Comma-separated list of scopes to filter the set of variables to train.'
    'By default, None would train all the variables.')
tf.app.flags.DEFINE_boolean(
    'ignore_missing_vars', False,
    'When restoring a checkpoint would ignore missing variables.')

FLAGS = tf.app.flags.FLAGS

How can I solve this issue?

For anyone having this issue, please unzip the files in the checkpoint folder like in the image and then check your train.py file. There would be a path issue for the checkpoint.

I changed the following:

tf.app.flags.DEFINE_string(
    'checkpoint_path', '/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt/ssd_300_vgg.ckpt',
    'The path to a checkpoint from which to fine-tune.')

AND

CHECKPOINT_PATH='/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt'

Here, the CHECKPOINT _PATH would be with once 'ssd_300_vgg.ckpt /', whereas in the tf.app.flags.DEFINE , it would contain 'ssd_300_vgg.ckpt/ssd_300_vgg.ckpt'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM