简体   繁体   English

C ++二进制标识(清单)

[英]C++ binary identification (manifest)

We have a large set of C++ projects (GCC, Linux, mostly static libraries) with many dependencies between them. 我们有一大堆C ++项目(GCC,Linux,主要是静态库),它们之间有许多依赖关系。 Then we compile an executable using these libraries and deploy the binary on the front-end. 然后我们使用这些库编译可执行文件并在前端部署二进制文件。 It would be extremely useful to be able to identify that binary. 能够识别二进制文件是非常有用的。 Ideally what we would like to have is a small script that would retrieve the following information directly from the binary: 理想情况下,我们希望拥有一个小脚本,可直接从二进制文件中检索以下信息:

$ident binary
$binary : Product=PRODUCT_NAME;Version=0.0.1;Build=xxx;User=xxx...
$  dependency: Product=PRODUCT_NAME1;Version=0.1.1;Build=xxx;User=xxx...
$  dependency: Product=PRODUCT_NAME2;Version=1.0.1;Build=xxx;User=xxx...

So it should display all the information for the binary itself and for all of its dependencies. 所以它应该显示二进制本身及其所有依赖项的所有信息。

Currently our approach is: 目前我们的方法是:

  1. During compilation for each product we generate Manifest.h and Manifest.cpp and then inject Manifest.o into binary 在为每个产品编译期间,我们生成Manifest.h和Manifest.cpp,然后将Manifest.o注入二进制文件

  2. ident script parses target binary, finds generated stuff there and prints this information ident脚本解析目标二进制文件,在那里找到生成的东西并打印此信息

However this approach is not always reliable for different versions of gcc.. I would like to ask SO community - is there better approach to solve this problem? 然而,对于不同版本的gcc,这种方法并不总是可靠的。我想问SO社区 - 有没有更好的方法来解决这个问题?

Thanks for any advice 谢谢你的建议

One of the catches with storing data in source code (your Manifest.h and .cpp ) is the size limit for literal data, which is dependent on the compiler. 在源代码( Manifest.h.cpp )中存储数据的捕获之一是文字数据的大小限制,这取决于编译器。

My suggestion is to use ld . 我的建议是使用ld It allows you to store arbitrary binary data in your ELF file (so does objcopy ). 它允许您在ELF文件中存储任意二进制数据( objcopy也是objcopy )。 If you prefer to write your own solution, have a look at libbfd . 如果您更喜欢编写自己的解决方案,请查看libbfd

Let us say we have a hello.cpp containing the usual C++ "Hello world" example. 让我们说我们有一个包含通常的C ++“Hello world”示例的hello.cpp Now we have the following make file ( GNUmakefile ): 现在我们有以下make文件( GNUmakefile ):

hello: hello.o hello.om
    $(LINK.cpp) $^ $(LOADLIBES) $(LDLIBS) -o $@

%.om: %.manifest
    ld -b binary -o $@ $<

%.manifest:
    echo "$@" > $@

What I'm doing here is to separate out the linking stage, because I want the manifest (after conversion to ELF object format) linked into the binary as well. 我在这里做的是分离链接阶段,因为我希望清单(在转换为ELF对象格式之后)链接到二进制文件中。 Since I am using suffix rules this is one way to go, others are certainly possible, including a better naming scheme for the manifests where they also end up as .o files and GNU make can figure out how to create those. 由于我使用的是后缀规则,这是一种方法,其他的肯定是可能的,包括一个更好的命名方案,它们最终作为.o文件和GNU make可以弄清楚如何创建它们。 Here I'm being explicit about the recipe. 在这里,我明确了解食谱。 So we have .om files, which are the manifests (arbitrary binary data), created from .manifest files. 所以我们有.om文件,它们是从.manifest文件创建的清单(任意二进制数据)。 The recipe states to convert the binary input into an ELF object. 配方声明将二进制输入转换为ELF对象。 The recipe for creating the .manifest itself simply pipes a string into the file. 创建.manifest本身的方法只是将一个字符串.manifest给文件。

Obviously the tricky part in your case isn't storing the manifest data, but rather generating it. 显然,您的案例中棘手的部分不是存储清单数据,而是生成清单数据。 And frankly I know too little about your build system to even attempt to suggest a recipe for the .manifest generation. 坦率地说,我对你的构建系统知之甚少,甚至试图为.manifest代建议一个配方。

Whatever you throw into your .manifest file should probably be some structured text that can be interpreted by the script you mention or that can even be output by the binary itself if you implement a command line switch (and disregard .so files and .so files hacked into behaving like ordinary executables when run from the shell). 无论你输入你的.manifest文件应该是一些结构化文本,可以由你提到的脚本解释,或者如果你实现命令行开关甚至可以由二进制文件本身输出(并忽略.so文件和.so文件从shell运行时被破解成普通的可执行文件。

The above make file doesn't take into account the dependencies - or rather it doesn't help you create the dependency list in any way. 上面的make文件没有考虑依赖项 - 或者说它不会帮助您以任何方式创建依赖项列表。 You can probably coerce GNU make into helping you with that if you express your dependencies clearly for each goal (ie the static libraries etc). 如果你为每个目标(即静态库等)清楚地表达你的依赖关系,你可以强迫GNU make帮助你。 But it may not be worth it to take that route ... 但采取这条路线可能不值得......

Also look at: 另请看:


If you want particular names for the symbols generated from the data (in your case the manifest), you need to use a slightly different route and use the method described by John Ripley here . 如果你想从数据生成的符号的特定名称(在你的案件清单),你需要使用一个稍微不同的路线和使用由约翰·里普利描述的方法在这里

How to access the symbols? 如何访问符号? Easy. 简单。 Declare them as external (C linkage!) data and then use them: 将它们声明为外部(C链接!)数据,然后使用它们:

#include <cstdio>

extern "C" char _binary_hello_manifest_start;
extern "C" char _binary_hello_manifest_end;

int main(int argc, char** argv)
{
        const ptrdiff_t len = &_binary_hello_manifest_end - &_binary_hello_manifest_start;
        printf("Hello world: %*s\n", (int)len, &_binary_hello_manifest_start);
}

The symbols are the exact characters/bytes. 符号是确切的字符/字节。 You could also declare them as char[] , but it would result in problems down the road. 您也可以将它们声明为char[] ,但这会导致问题。 Eg for the printf call. 例如,用于printf调用。

The reason I am calculating the size myself is because a.) I don't know whether the buffer is guaranteed to be zero-terminated and b.) I didn't find any documentation on interfacing with the *_size variable. 我自己计算大小的原因是因为a。)我不知道缓冲区是否保证是零终止的,并且b。)我没有找到任何关于与*_size变量接口的文档。

Side-note: the * in the format string tells printf that it should read the length of the string from the argument and then pick the next argument as the string to print out. 注意:格式字符串中的*告诉printf它应该从参数中读取字符串的长度,然后选择下一个参数作为要打印的字符串。

You can insert any data you like into a .comment section in your output binary. 您可以将任何您喜欢的数据插入输出二进制文件中的.comment部分。 You can do this with the linker after the fact, but it's probably easier to place it in your C++ code like this: 事实上,您可以使用链接器执行此操作,但可能更容易将它放在C ++代码中,如下所示:

 asm  (".section .comment.manifest\n\t"
       ".string \"hello, this is a comment\"\n\t"
       ".section .text");

 int main() {
   ....

The asm statement should go outside any function, in this instance. 在这种情况下, asm语句应该任何函数之外 This should work as long as your compiler puts normal functions in the .text section. 只要您的编译器将普通函数放在.text部分中,这就应该有效。 If it doesn't then you should make the obvious substitution. 如果没有,那么你应该做出明显的替代。

The linker should gather all the .comment.manifest sections into one blob in the final binary. 链接器应将所有.comment.manifest部分收集到最终二进制文件中的一个blob中。 You can extract them from any .o or executable with this: 您可以使用以下命令从任何.o或可执行文件中提取它们:

objdump -j .comment.manfest -s example.o

Have you thought about using standard packaging system of your distro? 你有没有想过使用你的发行版的标准包装系统? In our company we have thousands of packages and hundreds of them are automatically deployed every day. 在我们公司,我们有数千个包,每天都有数百个包自动部署。

We are using debian packages that contain all the neccessary information: 我们正在使用包含所有必要信息的debian包:

  • Full changelog that includes: 完整更新日志包括:
    • authors; 作者;
    • versions; 版本;
    • short descriptions and timestamps of changes. 简短描述和变更时间戳。
  • Dependency information: 依赖信息:
    • a list of all packages that must be installed for the current one to work correctly. 必须安装的所有软件包的列表才能使当前软件包正常工作。
  • Installation scripts that set up environment for a package. 为程序包设置环境的安装脚本。

I think you may not need to create manifests in your own way as soon as ready solution already exists. 我认为一旦现成的解决方案已经存在,您可能不需要以自己的方式创建清单。 You can have a look at debian package HowTo here . 你可以在这里看看debian包HowTo

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM