简体   繁体   English

Protobuf object 在传递到动态库时被破坏

[英]Protobuf object gets corrupted when passed to a dynamic library

I'm working on the integration of my library with some deep learning framework and I encountered some memory issues.我正在将我的库与一些深度学习框架集成,我遇到了一些 memory 问题。 I suspect that protobuf is the problem here but I wanted to ask you guys for opinion and some help because I had spent too much time on it already.我怀疑 protobuf 是这里的问题,但我想征求你们的意见和一些帮助,因为我已经花了太多时间在这上面。 In short the framework operates on deep learning models in ONNX format.简而言之,该框架在 ONNX 格式的深度学习模型上运行。 It reads them into memory to onnx::ModelProto objects.它将它们读入 memory 到onnx::ModelProto对象。 Those objects are then passed to my library where they get transformed (and optimized) to my custom representation and returned back to the framework.然后将这些对象传递给我的库,在那里它们被转换(和优化)为我的自定义表示并返回到框架。 onnx::ModelProto is a C++ class generated with protoc from https://github.com/onnx/onnx/blob/master/onnx/onnx.proto - a regular protobuf message. onnx::ModelProto是一个 C++ protoc使用来自https://github.com/onprotobuf/onnx.protox-amaster/

The problem occurs when the ModelProto reaches my library.ModelProto到达我的库时,就会出现问题。 The main member of the ModelProto is the graph, which is a pointer: onnx::GraphProto* onnx::ModelProto::graph_ . ModelProto的主要成员是图,它是一个指针: onnx::GraphProto* onnx::ModelProto::graph_ When the object is passed to my library, the graph pointer is set to some different address which is not a proper GraphProto object location:当 object 被传递到我的库时,图形指针被设置为一些不同的地址,这不是正确的GraphProto object 位置:

framework:
model_proto: 0x2ccb450
graph address: 0x2cc1d20
---
mylib:
model_proto: 0x2ccb450
graph address: 0x7fb6529c2560

The annoying thing is that it only happens in Release builds.烦人的事情是它只发生在发布版本中。 When I compile both in debug - it works correctly.当我在调试中编译两者时 - 它可以正常工作。

Also, before this error popped up, I was passing the ModelProto object to my library using the std::stringstream - I first serialized the model in the framework to string, created a stream out of it and deserialized in my library.此外,在弹出此错误之前,我使用std::stringstreamModelProto object 传递到我的库 - 我首先将框架中的 model 序列化为字符串,在 myD5498196C8 的 myD5498196C8 中创建了一个 ZF7B44CFAFD5C5222223E 库。 The graph was getting corrupted too just after the deserialization finished and it was so bad that I was getting segfaults further down in my code.反序列化完成后,图表也被破坏了,这太糟糕了,以至于我的代码中进一步出现了段错误。

Could this have anything to do with the fact that both the framework and my library link statically with their own copies of protobuf?这与框架和我的库都与它们自己的 protobuf 副本静态链接这一事实有关吗? Protobuf is added as a dependency and compiled with both the framework and my library. Protobuf 作为依赖项添加,并与框架和我的库一起编译。 I made sure that I use the same version (it's 3.11 at the moment).我确保我使用相同的版本(目前是 3.11)。 I also use the same ONNX version (1.6).我也使用相同的 ONNX 版本(1.6)。

Here's how the dependencies and the workflow look:以下是依赖项和工作流程的外观:

依赖关系

Since there is no standard ABI in C++, the bar for passing objects between libraries built separately is quite high.由于 C++ 中没有标准 ABI,因此在单独构建的库之间传递对象的门槛相当高。

The whole reason for using protobuf is to convert the objects to strings and then exchange those character arrays between the two endpoints.使用 protobuf 的全部原因是将对象转换为字符串,然后在两个端点之间交换这些字符 arrays。 That way you resolve all the issues around object having different layouts, formats, precisions, endianness.这样您就可以解决 object 周围的所有问题,这些问题具有不同的布局、格式、精度和字节序。

If you absolutely want to pass pointers around, the build settings must be identical.如果您绝对想传递指针,则构建设置必须相同。 Everything.一切。 All compiler and linker versions, settings, all #defines, optimization levels, etc... It's a path that is very tough to follow and makes for a brittle solution.所有编译器和 linker 版本、设置、所有#defines、优化级别等......这是一条很难遵循的路径,并且是一个脆弱的解决方案。

I think I've found a solution but I'm still not 100% certain what the root cause is.我想我已经找到了解决方案,但我仍然不能 100% 确定根本原因是什么。

ONNX library lets you customize the namespace in which the generated classes will residehttps://github.com/onnx/onnx/blob/master/CMakeLists.txt#L76-L78 ONNX 库允许您自定义生成的类将驻留的命名空间https://github.com/onnx/onnx/blob/master/CMakeLists.txt#L76-L78

I set it to an arbitrary value in my lib and that finally fixed the problem.我在我的库中将它设置为任意值,最终解决了这个问题。 I've switched back to the istringstream version and it seems to work.我已经切换回 istringstream 版本,它似乎工作。 It has passed many CI checks so things look good so far.它已经通过了许多 CI 检查,所以到目前为止情况看起来不错。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM