[英]Training SSD gives ValueError: Can't load save_path when it is None
I am using google colab for training my ssd model. This is the stack trace of my error:我正在使用 google colab 训练我的 ssd model。这是我的错误的堆栈跟踪:
Traceback (most recent call last):
File "train_ssd_network.py", line 394, in <module>
tf.app.run()
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "train_ssd_network.py", line 390, in main
sync_optimizer=None)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 753, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 1014, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 839, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python3.7/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 1003, in managed_session
start_standard_services=start_standard_services)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 734, in prepare_or_wait_for_session
init_fn=self._init_fn)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/session_manager.py", line 298, in prepare_session
init_fn(sess)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/framework/python/ops/variables.py", line 761, in callback
saver.restore(session, model_path)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 1277, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 325, in run
raise File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv)) File "train_ssd_network.py", line 390, in main
sync_optimizer=None) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 796, in train
should_retry = True File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
==================================
E1002 15:10:16.652289 140269098841984 tf_should_use.py:76] ==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 325, in run
raise File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv)) File "train_ssd_network.py", line 390, in main
sync_optimizer=None) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 796, in train
should_retry = True File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
==================================
I understand that there is an issue with the train_ssd.network.py
file, but what is the exact issue here?我知道train_ssd.network.py
文件有问题,但这里的确切问题是什么?
Here is an image for the checkpoints:这是检查点的图像:
I read StackOverflow questions where they mentioned that this could be a checkpoint-related issue.我阅读了 StackOverflow 问题,其中提到这可能是与检查点相关的问题。 However, I do have a checkpoint folder which has the ssd_300_vgg.ckpt
file unzipped which further contains two files.但是,我确实有一个检查点文件夹,其中解压缩了ssd_300_vgg.ckpt
文件,其中还包含两个文件。 This file is downloaded from the author's repository directly.该文件直接从作者的存储库中下载。
Other answers state as follows:其他答案state如下:
The error just means
tf.train.latest_checkpoin
t didn't find anything.该错误只是意味着tf.train.latest_checkpoin
没有找到任何东西。 It returnsNone
, then the Saver complains because it was passedNone
.它返回None
,然后 Saver 抱怨因为它被传递了None
。 So there's no checkpoint in that directory.因此该目录中没有检查点。
tf.app.flags.DEFINE_string(
'checkpoint_path', '/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt',
'The path to a checkpoint from which to fine-tune.')
tf.app.flags.DEFINE_string(
'checkpoint_model_scope', None,
'Model scope in the checkpoint. None if the same as the trained model.')
tf.app.flags.DEFINE_string(
'checkpoint_exclude_scopes', None,
'Comma-separated list of scopes of variables to exclude when restoring '
'from a checkpoint.')
tf.app.flags.DEFINE_string(
'trainable_scopes', None,
'Comma-separated list of scopes to filter the set of variables to train.'
'By default, None would train all the variables.')
tf.app.flags.DEFINE_boolean(
'ignore_missing_vars', False,
'When restoring a checkpoint would ignore missing variables.')
FLAGS = tf.app.flags.FLAGS
How can I solve this issue?我该如何解决这个问题?
For anyone having this issue, please unzip the files in the checkpoint folder like in the image and then check your train.py file.对于遇到此问题的任何人,请解压缩图像中检查点文件夹中的文件,然后检查您的 train.py 文件。 There would be a path issue for the checkpoint.检查点会有路径问题。
I changed the following:我更改了以下内容:
tf.app.flags.DEFINE_string(
'checkpoint_path', '/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt/ssd_300_vgg.ckpt',
'The path to a checkpoint from which to fine-tune.')
AND和
CHECKPOINT_PATH='/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt'
Here, the CHECKPOINT _PATH
would be with once 'ssd_300_vgg.ckpt
/', whereas in the tf.app.flags.DEFINE
, it would contain 'ssd_300_vgg.ckpt/ssd_300_vgg.ckpt'
在这里, CHECKPOINT _PATH
将包含一次'ssd_300_vgg.ckpt
/”,而在tf.app.flags.DEFINE
中,它将包含'ssd_300_vgg.ckpt/ssd_300_vgg.ckpt'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.