将 FaceNet model 转换为 ONNX 格式时出错

Question

System information系统信息

OS Platform and Distribution: Linux Ubuntu 19.10操作系统平台和发行版：Linux Ubuntu 19.10
Tensorflow Version: 1.15 Tensorflow 版本：1.15
Python version: 3.7 Python 版本：3.7

Issue问题

I downloaded a tensorflow model of FaceNet from this page , and I'm trying to convert it from.pb into a.onnx file, however it raises the following error:我从这个页面下载了一个 tensorflow model 的 FaceNet，我试图将它从 .pb 转换成一个 .onnx 文件，但是它引发了以下错误：

To Reproduce重现

root@xesk-VirtualBox:/home/xesk/Desktop# python -m tf2onnx.convert --saved-model home/xesk/Desktop/2s/20180402-114759/20180402-114759.pb --output model.onnx

    2020-08-03 20:18:05.081538: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
    2020-08-03 20:18:05.081680: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
    2020-08-03 20:18:07,431 - WARNING - '--tag' not specified for saved_model. Using --tag serve
    Traceback (most recent call last):
    File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "/usr/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
    File "/usr/local/lib/python3.8/dist-packages/tf2onnx/convert.py", line 171, in
    main()
    File "/usr/local/lib/python3.8/dist-packages/tf2onnx/convert.py", line 131, in main
    graph_def, inputs, outputs = tf_loader.from_saved_model(
    File "/usr/local/lib/python3.8/dist-packages/tf2onnx/tf_loader.py", line 288, in from_saved_model
    _from_saved_model_v2(model_path, input_names, output_names, tag, signatures, concrete_function)
    File "/usr/local/lib/python3.8/dist-packages/tf2onnx/tf_loader.py", line 247, in _from_saved_model_v2
    imported = tf.saved_model.load(model_path, tags=tag) # pylint: disable=no-value-for-parameter
    File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/load.py", line 603, in load
    return load_internal(export_dir, tags, options)
    File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/load.py", line 614, in load_internal
    loader_impl.parse_saved_model_with_debug_info(export_dir))
    File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 56, in parse_saved_model_with_debug_info
    saved_model = _parse_saved_model(export_dir)
    File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 110, in parse_saved_model
    raise IOError("SavedModel file does not exist at: %s/{%s|%s}" %
    OSError: SavedModel file does not exist at: home/xesk/Desktop/2s/20180402-114759/20180402-114759.pb/{saved_model.pbtxt|saved_model.pb}

Additional context附加上下文

I'm not running any CUDA or similars, only CPU.我没有运行任何 CUDA 或类似的东西，只有 CPU。 The model downloaded is the 20180402-114759 .下载的model是20180402-114759 。 It's the first time I'm working with this tools, and I'm a bit of a beginner in this AI world, so I might be missing something obvious.这是我第一次使用这些工具，而且我在这个 AI 世界中还是个初学者，所以我可能遗漏了一些明显的东西。 Of course, I checked the path and the command syntax several times.当然，我多次检查了路径和命令语法。 Might be something to do with the format of the files i downloaded?可能与我下载的文件格式有关？

EDIT编辑

Following Venkatesh Wadawadagi 's answer, I'm going for Option 1. Changing the name of the .meta file solved the problem of the script from not recognising it.根据Venkatesh Wadawadagi的回答，我选择了选项 1。更改.meta文件的名称解决了脚本无法识别的问题。

The script is running more or less correctly, and finishes creating the export_dir directory, with export_dir > 0 > variables subfolders.该脚本或多或少正确运行，并完成创建 export_dir 目录，其中export_dir > 0 > variables子文件夹。 However, they are empty.然而，它们是空的。

The console output is this:控制台output是这样的：

xesk@xesk:~/Desktop/UP2S/ACROMEGALLY/20180402-114759$ python3 ./pb2sm
2020-08-10 16:02:26.128846: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-10 16:02:26.129114: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-10 16:02:26.129137: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (xesk): /proc/driver/nvidia/version does not exist
2020-08-10 16:02:26.129501: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-08-10 16:02:26.139076: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2592000000 Hz
2020-08-10 16:02:26.139506: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44018d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-10 16:02:26.139520: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/queue_runner_impl.py:391: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
2020-08-10 16:02:32.681265: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 17676288 exceeds 10% of system memory.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value InceptionResnetV1/Block8/Branch_0/Conv2d_1x1/BatchNorm/beta/Adam
     [[{{node save/SaveV2_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./pb2sm", line 17, in <module>
    strip_default_attrs=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/saved_model/builder_impl.py", line 595, in add_meta_graph_and_variables
    saver.save(sess, variables_path, write_meta_graph=False, write_state=False)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 1193, in save
    raise exc
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 1176, in save
    {self.saver_def.filename_tensor_name: checkpoint_file})
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value InceptionResnetV1/Block8/Branch_0/Conv2d_1x1/BatchNorm/beta/Adam
     [[node save/SaveV2_1 (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/SaveV2_1':
  File "./pb2sm", line 17, in <module>
    strip_default_attrs=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/saved_model/builder_impl.py", line 589, in add_meta_graph_and_variables
    saver = self._maybe_create_saver(saver)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/saved_model/builder_impl.py", line 227, in _maybe_create_saver
    allow_empty=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 499, in _build_internal
    save_tensor = self._AddShardedSaveOps(filename_tensor, per_device)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 291, in _AddShardedSaveOps
    return self._AddShardedSaveOpsForV2(filename_tensor, per_device)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 265, in _AddShardedSaveOpsForV2
    sharded_saves.append(self._AddSaveOps(sharded_filename, saveables))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 206, in _AddSaveOps
    save = self.save_op(filename_tensor, saveables)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 122, in save_op
    tensors)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1946, in save_v2
    name=name)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Is it possible that I'm missing some library to install?有没有可能我缺少一些要安装的库？ Seems to have something to do with some CUDA implementation, which I have none.似乎与某些 CUDA 实现有关，但我没有。 Is it possible?可能吗？

Answer 1

The command you're using:您正在使用的命令：

python -m tf2onnx.convert --saved-model home/xesk/Desktop/2s/20180402-114759/20180402-114759.pb --output model.onnx

Note that Fac.net trained model that you're using has got only frozen graph ( .pb file) and checkpoint ( .ckpt ) and does not have saved-model that your command is looking for.请注意，您正在使用的Fac.net trained model 只有冻结图（ .pb文件）和检查点（ .ckpt ）并且没有您的命令正在寻找的saved-model 。

So basically you are passing the path to the .pb file of the frozen graph, which is different from the .pb file of a SavedModel (which you don't have).所以基本上你将路径传递给冻结图的.pb文件，这与SavedModel的.pb文件（你没有）不同。 Savedmodel will have variables folder along with saved_model.pb file. Savedmodel 将包含variables文件夹和saved_model.pb文件。

That's why the error:这就是错误的原因：

OSError: SavedModel file does not exist

Read more about SavedModel here .在此处阅读有关 SavedModel 的更多信息。

To proceed with ONNX conversion, you have two options:要继续进行 ONNX 转换，您有两种选择：

Convert checkpoint to SavedModel:将检查点转换为 SavedModel：

Use the following code for that:为此使用以下代码：

import os
import tensorflow as tf

trained_checkpoint_prefix = 'model-20180402-114759.ckpt-275'
export_dir = os.path.join('export_dir', '0')

graph = tf.Graph()
with tf.compat.v1.Session(graph=graph) as sess:
    # Restore from checkpoint
    loader = tf.compat.v1.train.import_meta_graph(trained_checkpoint_prefix + '.meta')
    loader.restore(sess, trained_checkpoint_prefix)

    # Export checkpoint to SavedModel
    builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
    builder.add_meta_graph_and_variables(sess,
                                         [tf.saved_model.TRAINING, tf.saved_model.SERVING],
                                         strip_default_attrs=True)
    builder.save()

Note: .data , .index and .meta should have same prefix then this code will work.注意： .data 、 .index和.meta应该有相同的前缀，然后这段代码才能工作。 So rename .meta file.所以重命名.meta文件。

mv model-20180402-114759.meta model-20180402-114759.ckpt-275.meta

For example:例如：

Make use of ckpt file or frozen-graph.pb for onnx conversion使用ckpt文件或frozen-graph.pb进行 onnx 转换

From checkpoint format:从检查点格式：

python -m tf2onnx.convert --checkpoint tensorflow-model-meta-file-path --output model.onnx --inputs input0:0,input1:0 --outputs output0:0

From graphdef/frozen-graph format:来自 graphdef/frozen-graph 格式：

python -m tf2onnx.convert --graphdef tensorflow-model-graphdef-file --output model.onnx --inputs input0:0,input1:0 --outputs output0:0

If your TensorFlow model is in a format other than saved model , then you need to provide the inputs and outputs of the model graph.如果您的 TensorFlow model 的格式不是saved model ，那么您需要提供 model 图的inputs和outputs 。

From this :从这个：

If your model is in checkpoint or graphdef format and you do not know the input and output nodes of the model, you can use the summarize_graph TensorFlow utility.如果您的 model 是检查点或graphdef格式，并且您不知道 model 的输入和 output 节点，则可以使用summarize_graph TensorFlow 实用程序。 The summarize_graph tool does need to be downloaded and built from source. summarize_graph工具确实需要从源代码下载和构建。 If you have the option of going to your model provider and obtaining the model in saved model format, then we recommend doing so.如果您可以选择前往您的 model 提供商并以保存的 model 格式获取 model，那么我们建议您这样做。

Answer 2

I have encountered similiar error.我遇到过类似的错误。 In my case, I made a mistake by giving pb file instead of path/to/savedmodel which should be the path to the directory containing saved_model.pb .在我的例子中，我错误地给出了 pb 文件而不是path/to/savedmodel ，它应该是包含saved_model.pb的目录的路径。 So assuming your 20180402-114759.pb is at directory home/xesk/Desktop/2s/20180402-114759 , the command should be:因此，假设您的20180402-114759.pb位于目录home/xesk/Desktop/2s/20180402-114759中，命令应为：

python -m tf2onnx.convert --saved-model home/xesk/Desktop/2s/20180402-114759 --output model.onnx

Please refer to Getting Started Converting TensorFlow to ONNX and Using the SavedModel format for more information.有关详细信息，请参阅开始将 TensorFlow 转换为 ONNX和使用 SavedModel 格式。

将 FaceNet model 转换为 ONNX 格式时出错

问题描述

2 个解决方案

解决方案1
1 2020-08-03 21:27:54

解决方案2
0 2023-01-02 18:57:02

将 FaceNet model 转换为 ONNX 格式时出错

问题描述

2 个解决方案

解决方案1 1 2020-08-03 21:27:54

解决方案2 0 2023-01-02 18:57:02

解决方案1
1 2020-08-03 21:27:54

解决方案2
0 2023-01-02 18:57:02