训练 SSD 时出现 ValueError: Can't load save_path when it is None

Question

I am using google colab for training my ssd model. This is the stack trace of my error:我正在使用 google colab 训练我的 ssd model。这是我的错误的堆栈跟踪：

Traceback (most recent call last):
  File "train_ssd_network.py", line 394, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train_ssd_network.py", line 390, in main
    sync_optimizer=None)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 753, in train
    master, start_standard_services=False, config=session_config) as sess:
  File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 1014, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 839, in stop
    ignore_live_threads=ignore_live_threads)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python3.7/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 1003, in managed_session
    start_standard_services=start_standard_services)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 734, in prepare_or_wait_for_session
    init_fn=self._init_fn)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/session_manager.py", line 298, in prepare_session
    init_fn(sess)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/framework/python/ops/variables.py", line 761, in callback
    saver.restore(session, model_path)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 1277, in restore
    raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 325, in run
    raise  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))  File "train_ssd_network.py", line 390, in main
    sync_optimizer=None)  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 796, in train
    should_retry = True  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
==================================
E1002 15:10:16.652289 140269098841984 tf_should_use.py:76] ==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 325, in run
    raise  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))  File "train_ssd_network.py", line 390, in main
    sync_optimizer=None)  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 796, in train
    should_retry = True  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
==================================

I understand that there is an issue with the train_ssd.network.py file, but what is the exact issue here?我知道train_ssd.network.py文件有问题，但这里的确切问题是什么？

Here is an image for the checkpoints:这是检查点的图像：

I read StackOverflow questions where they mentioned that this could be a checkpoint-related issue.我阅读了 StackOverflow 问题，其中提到这可能是与检查点相关的问题。 However, I do have a checkpoint folder which has the ssd_300_vgg.ckpt file unzipped which further contains two files.但是，我确实有一个检查点文件夹，其中解压缩了ssd_300_vgg.ckpt文件，其中还包含两个文件。 This file is downloaded from the author's repository directly.该文件直接从作者的存储库中下载。

Other answers state as follows:其他答案state如下：

The error just means tf.train.latest_checkpoin t didn't find anything.该错误只是意味着tf.train.latest_checkpoin没有找到任何东西。 It returns None , then the Saver complains because it was passed None .它返回None ，然后 Saver 抱怨因为它被传递了None 。 So there's no checkpoint in that directory.因此该目录中没有检查点。

tf.app.flags.DEFINE_string(
    'checkpoint_path', '/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt',
    'The path to a checkpoint from which to fine-tune.')
tf.app.flags.DEFINE_string(
    'checkpoint_model_scope', None,
    'Model scope in the checkpoint. None if the same as the trained model.')
tf.app.flags.DEFINE_string(
    'checkpoint_exclude_scopes', None,
    'Comma-separated list of scopes of variables to exclude when restoring '
    'from a checkpoint.')
tf.app.flags.DEFINE_string(
    'trainable_scopes', None,
    'Comma-separated list of scopes to filter the set of variables to train.'
    'By default, None would train all the variables.')
tf.app.flags.DEFINE_boolean(
    'ignore_missing_vars', False,
    'When restoring a checkpoint would ignore missing variables.')

FLAGS = tf.app.flags.FLAGS

How can I solve this issue?我该如何解决这个问题？

Answer 1

For anyone having this issue, please unzip the files in the checkpoint folder like in the image and then check your train.py file.对于遇到此问题的任何人，请解压缩图像中检查点文件夹中的文件，然后检查您的 train.py 文件。 There would be a path issue for the checkpoint.检查点会有路径问题。

I changed the following:我更改了以下内容：

tf.app.flags.DEFINE_string(
    'checkpoint_path', '/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt/ssd_300_vgg.ckpt',
    'The path to a checkpoint from which to fine-tune.')

AND和

CHECKPOINT_PATH='/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt'

Here, the CHECKPOINT _PATH would be with once 'ssd_300_vgg.ckpt /', whereas in the tf.app.flags.DEFINE , it would contain 'ssd_300_vgg.ckpt/ssd_300_vgg.ckpt'在这里， CHECKPOINT _PATH将包含一次'ssd_300_vgg.ckpt /”，而在tf.app.flags.DEFINE中，它将包含'ssd_300_vgg.ckpt/ssd_300_vgg.ckpt'

训练 SSD 时出现 ValueError: Can't load save_path when it is None

问题描述

1 个解决方案

解决方案1
0 2021-10-03 12:53:20

训练 SSD 时出现 ValueError: Can't load save_path when it is None

问题描述

1 个解决方案

解决方案1 0 2021-10-03 12:53:20

解决方案1
0 2021-10-03 12:53:20