简体   繁体   English

如何合并两个二进制可执行文件?

[英]How do I merge two binary executables?

This question follows on from another question I asked before. 这个问题是我之前问过的另一个问题 In short, this is one of my attempts at merging two fully linked executables into a single fully linked executable. 简而言之,这是我将两个完全链接的可执行文件合并为一个完全链接的可执行文件的尝试之一。 The difference is that the previous question deals with merging an object file to a full linked executable which is even harder because it means I need to manually deal with relocations. 不同之处在于,上一个问题涉及将目标文件合并到完全链接的可执行文件中,这更加困难,因为这意味着我需要手动处理重定位。

What I have are the following files: 我拥有的是以下文件:

example-target.c : example-target.c

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
    puts("1234");
    return EXIT_SUCCESS;
}

example-embed.c : example-embed.c

#include <stdlib.h>
#include <stdio.h>

/*
 * Fake main. Never used, just there so we can perform a full link.
 */
int main(void)
{
    return EXIT_SUCCESS;
}

void func1(void)
{
    puts("asdf");
}

My goal is to merge these two executables to produce a final executable which is the same as example-target , but additionally has another main and func1 . 我的目标是合并这两个可执行文件以生成最终的可执行文件,该文件与example-target相同,但另外还有另一个mainfunc1

From the point of view of the BFD library, each binary is composed (amongst other things) of a set of sections. 从BFD库的角度来看,每个二进制文件(除其他外)由一组节组成。 One of the first problems I faced was that these sections had conflicting load addresses (such that if I was to merge them, the sections would overlap). 我遇到的第一个问题是这些部分的加载地址冲突(因此,如果我要合并它们,则这些部分将重叠)。

What I did to solve this was to analyse example-target programmatically to get a list of the load address and sizes of each of its sections. 为了解决这个问题,我要做的是以编程方式分析example-target以获取加载地址及其每个部分的大小的列表。 I then did the same for example-embed and used this information to dynamically generate a linker command for example-embed.c which ensures that all of its sections are linked at addresses that do not overlap with any of the sections in example-target . 然后,我对example-embed进行了同样的操作,并使用此信息动态生成了example-embed.c链接器命令 ,该命令可确保其所有节的链接地址都不会与example-target任何节重叠。 Hence example-embed is actually fully linked twice in this process: once to determine how many sections and what sizes they are, and once again to link with a guarantee that there are no section clashes with example-target . 因此example-embed实际上在此过程中被完全链接了两次:一次确定多少节以及它们的大小,再一次进行链接以确保没有节与example-target冲突。

On my system, the linker command produced is: 在我的系统上,生成的链接器命令是:

-Wl,--section-start=.new.interp=0x1004238,--section-start=.new.note.ABI-tag=0x1004254,
--section-start=.new.note.gnu.build-id=0x1004274,--section-start=.new.gnu.hash=0x1004298,
--section-start=.new.dynsym=0x10042B8,--section-start=.new.dynstr=0x1004318,
--section-start=.new.gnu.version=0x1004356,--section-start=.new.gnu.version_r=0x1004360,
--section-start=.new.rela.dyn=0x1004380,--section-start=.new.rela.plt=0x1004398,
--section-start=.new.init=0x10043C8,--section-start=.new.plt=0x10043E0,
--section-start=.new.text=0x1004410,--section-start=.new.fini=0x10045E8,
--section-start=.new.rodata=0x10045F8,--section-start=.new.eh_frame_hdr=0x1004604,
--section-start=.new.eh_frame=0x1004638,--section-start=.new.ctors=0x1204E28,
--section-start=.new.dtors=0x1204E38,--section-start=.new.jcr=0x1204E48,
--section-start=.new.dynamic=0x1204E50,--section-start=.new.got=0x1204FE0,
--section-start=.new.got.plt=0x1204FE8,--section-start=.new.data=0x1205010,
--section-start=.new.bss=0x1205020,--section-start=.new.comment=0xC04000

(Note that I prefixed section names with .new using objcopy --prefix-sections=.new example-embedobj to avoid section name clashes.) (请注意,我前缀部分名称以.new使用objcopy --prefix-sections=.new example-embedobj避免部分名称冲突。)

I then wrote some code to generate a new executable (borrowed some code both from objcopy and Security Warrior book). 然后,我写了一些代码来生成一个新的可执行文件(从objcopySecurity Warrior书中借用了一些代码)。 The new executable should have: 新的可执行文件应具有:

  • All the sections of example-target and all the sections of example-embed example-target所有部分和example-embed所有部分
  • A symbol table which contains all the symbols from example-target and all the symbols of example-embed 一个符号表,其中包含来自example-target所有符号和example-embed所有符号

The code I wrote is: 我写的代码是:

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <bfd.h>
#include <libiberty.h>

struct COPYSECTION_DATA {
    bfd *      obfd;
    asymbol ** syms;
    int        symsize;
    int        symcount;
};

void copy_section(bfd * ibfd, asection * section, PTR data)
{
    struct COPYSECTION_DATA * csd  = data;
    bfd *             obfd = csd->obfd;
    asection *        s;
    long              size, count, sz_reloc;

    if((bfd_get_section_flags(ibfd, section) & SEC_GROUP) != 0) {
        return;
    }

    /* get output section from input section struct */
    s        = section->output_section;
    /* get sizes for copy */
    size     = bfd_get_section_size(section);
    sz_reloc = bfd_get_reloc_upper_bound(ibfd, section);

    if(!sz_reloc) {
        /* no relocations */
        bfd_set_reloc(obfd, s, NULL, 0);
    } else if(sz_reloc > 0) {
        arelent ** buf;

        /* build relocations */
        buf   = xmalloc(sz_reloc);
        count = bfd_canonicalize_reloc(ibfd, section, buf, csd->syms);
        /* set relocations for the output section */
        bfd_set_reloc(obfd, s, count ? buf : NULL, count);
        free(buf);
    }

    /* get input section contents, set output section contents */
    if(section->flags & SEC_HAS_CONTENTS) {
        bfd_byte * memhunk = NULL;
        bfd_get_full_section_contents(ibfd, section, &memhunk);
        bfd_set_section_contents(obfd, s, memhunk, 0, size);
        free(memhunk);
    }
}

void define_section(bfd * ibfd, asection * section, PTR data)
{
    bfd *      obfd = data;
    asection * s    = bfd_make_section_anyway_with_flags(obfd,
            section->name, bfd_get_section_flags(ibfd, section));
    /* set size to same as ibfd section */
    bfd_set_section_size(obfd, s, bfd_section_size(ibfd, section));

    /* set vma */
    bfd_set_section_vma(obfd, s, bfd_section_vma(ibfd, section));
    /* set load address */
    s->lma = section->lma;
    /* set alignment -- the power 2 will be raised to */
    bfd_set_section_alignment(obfd, s,
            bfd_section_alignment(ibfd, section));
    s->alignment_power = section->alignment_power;
    /* link the output section to the input section */
    section->output_section = s;
    section->output_offset  = 0;

    /* copy merge entity size */
    s->entsize = section->entsize;

    /* copy private BFD data from ibfd section to obfd section */
    bfd_copy_private_section_data(ibfd, section, obfd, s);
}

void merge_symtable(bfd * ibfd, bfd * embedbfd, bfd * obfd,
        struct COPYSECTION_DATA * csd)
{
    /* set obfd */
    csd->obfd     = obfd;

    /* get required size for both symbol tables and allocate memory */
    csd->symsize  = bfd_get_symtab_upper_bound(ibfd) /********+
            bfd_get_symtab_upper_bound(embedbfd) */;
    csd->syms     = xmalloc(csd->symsize);

    csd->symcount =  bfd_canonicalize_symtab (ibfd, csd->syms);
    /******** csd->symcount += bfd_canonicalize_symtab (embedbfd,
            csd->syms + csd->symcount); */

    /* copy merged symbol table to obfd */
    bfd_set_symtab(obfd, csd->syms, csd->symcount);
}

bool merge_object(bfd * ibfd, bfd * embedbfd, bfd * obfd)
{
    struct COPYSECTION_DATA csd = {0};

    if(!ibfd || !embedbfd || !obfd) {
        return FALSE;
    }

    /* set output parameters to ibfd settings */
    bfd_set_format(obfd, bfd_get_format(ibfd));
    bfd_set_arch_mach(obfd, bfd_get_arch(ibfd), bfd_get_mach(ibfd));
    bfd_set_file_flags(obfd, bfd_get_file_flags(ibfd) &
            bfd_applicable_file_flags(obfd));

    /* set the entry point of obfd */
    bfd_set_start_address(obfd, bfd_get_start_address(ibfd));

    /* define sections for output file */
    bfd_map_over_sections(ibfd, define_section, obfd);
    /******** bfd_map_over_sections(embedbfd, define_section, obfd); */

    /* merge private data into obfd */
    bfd_merge_private_bfd_data(ibfd, obfd);
    /******** bfd_merge_private_bfd_data(embedbfd, obfd); */

    merge_symtable(ibfd, embedbfd, obfd, &csd);

    bfd_map_over_sections(ibfd, copy_section, &csd);
    /******** bfd_map_over_sections(embedbfd, copy_section, &csd); */

    free(csd.syms);
    return TRUE;
}

int main(int argc, char **argv)
{
    bfd * ibfd;
    bfd * embedbfd;
    bfd * obfd;

    if(argc != 4) {
        perror("Usage: infile embedfile outfile\n");
        xexit(-1);
    }

    bfd_init();
    ibfd     = bfd_openr(argv[1], NULL);
    embedbfd = bfd_openr(argv[2], NULL);

    if(ibfd == NULL || embedbfd == NULL) {
        perror("asdfasdf");
        xexit(-1);
    }

    if(!bfd_check_format(ibfd, bfd_object) ||
            !bfd_check_format(embedbfd, bfd_object)) {
        perror("File format error");
        xexit(-1);
    }

    obfd = bfd_openw(argv[3], NULL);
    bfd_set_format(obfd, bfd_object);

    if(!(merge_object(ibfd, embedbfd, obfd))) {
        perror("Error merging input/obj");
        xexit(-1);
    }

    bfd_close(ibfd);
    bfd_close(embedbfd);
    bfd_close(obfd);
    return EXIT_SUCCESS;
}

To summarise what this code does, it takes 2 input files ( ibfd and embedbfd ) to generate an output file ( obfd ). 总结一下此代码的作用,需要两个输入文件( ibfdembedbfd )来生成输出文件( obfd )。

  • Copies format/arch/mach/file flags and start address from ibfd to obfd 将格式/ arch / mach /文件标志和起始地址从ibfdobfd
  • Defines sections from both ibfd and embedbfd to obfd . 定义从ibfdembedbfdobfd Population of the sections happens separately because BFD mandates that all sections are created before any start to be populated. 这些部分的填充是分开进行的,因为BFD要求在开始任何填充之前创建所有部分。
  • Merge private data of both input BFDs to the output BFD. 将两个输入BFD的私有数据合并到输出BFD。 Since BFD is a common abstraction above many file formats, it is not necessarily able to comprehensively encapsulate everything required by the underlying file format. 由于BFD是许多文件格式之上的常见抽象,因此它不一定能够全面封装基础文件格式所需的所有内容。
  • Create a combined symbol table consisting of the symbol table of ibfd and embedbfd and set this as the symbol table of obfd . 创建一个由ibfdembedbfd符号表组成的组合符号表,并将其设置为obfd的符号表。 This symbol table is saved so it can later be used to build relocation information. 此符号表已保存,因此以后可用于构建重定位信息。
  • Copy the sections from ibfd to obfd . 将这些部分从ibfd复制到obfd As well as copying the section contents, this step also deals with building and setting the relocation table. 除了复制节内容外,此步骤还涉及构建和设置重定位表。

In the code above, some lines are commented out with /******** */ . 在上面的代码中,某些行用/******** */注释掉。 These lines deal with the merging of example-embed . 这些行涉及example-embed的合并。 If they are commented out, what happens is that obfd is simply built as a copy of ibfd . 如果将其注释掉, obfd发生的情况是obfd只是作为ibfd的副本而ibfd I have tested this and it works fine. 我已经对此进行了测试,并且效果很好。 However, once I comment these lines back in the problems start occurring. 但是,一旦我在评论这些行时就开始出现问题。

With the uncommented version which does the full merge, it still generates an output file. 对于进行完全合并的未注释版本,它仍然会生成输出文件。 This output file can be inspected with objdump and found to have all the sections, code and symbol tables of both inputs. 可以使用objdump检查此输出文件,并发现它们具有两个输入的所有节,代码和符号表。 However, objdump complains with: 但是, objdump抱怨:

BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708

On my system, 1708 of elf.c is: 在我的系统上, elf.c 1708是:

BFD_ASSERT (elf_dynsymtab (abfd) == 0);

elf_dynsymtab is a macro in elf-bfd.h for: elf_dynsymtabelf-bfd.h的宏,用于:

#define elf_dynsymtab(bfd)  (elf_tdata(bfd) -> dynsymtab_section)

I'm not familiar with the ELF layer, but I believe this is a problem reading the dynamic symbol table (or perhaps saying it's not present). 我对ELF层不熟悉,但是我认为读取动态符号表是一个问题(或者说它不存在)。 For the time, I am trying to avoid having to reach down directly into the ELF layer unless necessary. 暂时,除非有必要,我试图避免直接进入ELF层。 Is anyone able to tell me what I'm doing wrong either in my code or conceptually? 有谁能告诉我我在代码中或概念上做错了什么?

If it is helpful, I can also post the code for the linker command generation or compiled versions of the example binaries. 如果有帮助,我还可以发布用于链接程序命令生成的代码或示例二进制文件的编译版本。


I realise that this is a very large question and for this reason, I would like to properly reward anyone who is able to help me with it. 我意识到这是一个很大的问题,因此,我想对任何能够帮助我的人给予适当的奖励。 If I am able to solve this with the help of someone, I am happy to award a 500+ bonus. 如果我能够在某人的帮助下解决此问题,我很乐意奖励500多个奖金。

Why do all of this manually? 为什么要手动进行所有这些操作? Given that you have all symbol information (which you must if you want to edit the binary in a sane way), wouldn't it be easier to SPLIT the executable into separate object files (say, one object file per function), do your editing, and relink it? 既然您拥有所有符号信息(如果要以理性的方式编辑二进制文件,则必须具有),将可执行文件拆分为单独的目标文件(例如,每个功能一个目标文件)不是更容易吗?编辑,然后重新链接?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将ac文件链接到两个可执行文件 - how do I link a c file to two executables 如何使用自动工具设置项目,以便生成的可执行文件可以查看所需的二进制文件? - How do I setup a project with autotools so resulting executables can see required binary files? CMake项目结构:如何正确合并库并将它们包含在多个可执行文件中 - CMake Project Structure: How do I properly merge libraries together and include them in multiple executables 如何在 CMake 中构建两个共享主 function 的可执行文件? - How can I build two executables that share a main function in CMake? 如何添加两个相同大小的二进制数组? - How do I add two same size binary arrays? 如何将两个C代码合并为一个? - How do I merge two C codes into one? 如何计算linux中两个二进制文件(即两个可执行文件)之间的差异 - how to compute differences between two binaries (i.e., two executables) in linux 在C中合并两个二叉树 - Merge two binary trees in C 如何编写Makefile从一个来源生成两个可执行文件? - How can I write a Makefile to produce two executables from one source? 用C编写自己的shell,如何运行Unix可执行文件? - Writing my own shell in C, how do I run Unix executables?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM