[英]How do I merge two binary executables?
This question follows on from another question I asked before. 这个问题是我之前问过的另一个问题 。 In short, this is one of my attempts at merging two fully linked executables into a single fully linked executable. 简而言之,这是我将两个完全链接的可执行文件合并为一个完全链接的可执行文件的尝试之一。 The difference is that the previous question deals with merging an object file to a full linked executable which is even harder because it means I need to manually deal with relocations. 不同之处在于,上一个问题涉及将目标文件合并到完全链接的可执行文件中,这更加困难,因为这意味着我需要手动处理重定位。
What I have are the following files: 我拥有的是以下文件:
example-target.c
: example-target.c
:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
puts("1234");
return EXIT_SUCCESS;
}
example-embed.c
: example-embed.c
:
#include <stdlib.h>
#include <stdio.h>
/*
* Fake main. Never used, just there so we can perform a full link.
*/
int main(void)
{
return EXIT_SUCCESS;
}
void func1(void)
{
puts("asdf");
}
My goal is to merge these two executables to produce a final executable which is the same as example-target
, but additionally has another main
and func1
. 我的目标是合并这两个可执行文件以生成最终的可执行文件,该文件与example-target
相同,但另外还有另一个main
和func1
。
From the point of view of the BFD library, each binary is composed (amongst other things) of a set of sections. 从BFD库的角度来看,每个二进制文件(除其他外)由一组节组成。 One of the first problems I faced was that these sections had conflicting load addresses (such that if I was to merge them, the sections would overlap). 我遇到的第一个问题是这些部分的加载地址冲突(因此,如果我要合并它们,则这些部分将重叠)。
What I did to solve this was to analyse example-target
programmatically to get a list of the load address and sizes of each of its sections. 为了解决这个问题,我要做的是以编程方式分析example-target
以获取加载地址及其每个部分的大小的列表。 I then did the same for example-embed
and used this information to dynamically generate a linker command for example-embed.c
which ensures that all of its sections are linked at addresses that do not overlap with any of the sections in example-target
. 然后,我对example-embed
进行了同样的操作,并使用此信息动态生成了example-embed.c
的链接器命令 ,该命令可确保其所有节的链接地址都不会与example-target
任何节重叠。 Hence example-embed
is actually fully linked twice in this process: once to determine how many sections and what sizes they are, and once again to link with a guarantee that there are no section clashes with example-target
. 因此example-embed
实际上在此过程中被完全链接了两次:一次确定多少节以及它们的大小,再一次进行链接以确保没有节与example-target
冲突。
On my system, the linker command produced is: 在我的系统上,生成的链接器命令是:
-Wl,--section-start=.new.interp=0x1004238,--section-start=.new.note.ABI-tag=0x1004254,
--section-start=.new.note.gnu.build-id=0x1004274,--section-start=.new.gnu.hash=0x1004298,
--section-start=.new.dynsym=0x10042B8,--section-start=.new.dynstr=0x1004318,
--section-start=.new.gnu.version=0x1004356,--section-start=.new.gnu.version_r=0x1004360,
--section-start=.new.rela.dyn=0x1004380,--section-start=.new.rela.plt=0x1004398,
--section-start=.new.init=0x10043C8,--section-start=.new.plt=0x10043E0,
--section-start=.new.text=0x1004410,--section-start=.new.fini=0x10045E8,
--section-start=.new.rodata=0x10045F8,--section-start=.new.eh_frame_hdr=0x1004604,
--section-start=.new.eh_frame=0x1004638,--section-start=.new.ctors=0x1204E28,
--section-start=.new.dtors=0x1204E38,--section-start=.new.jcr=0x1204E48,
--section-start=.new.dynamic=0x1204E50,--section-start=.new.got=0x1204FE0,
--section-start=.new.got.plt=0x1204FE8,--section-start=.new.data=0x1205010,
--section-start=.new.bss=0x1205020,--section-start=.new.comment=0xC04000
(Note that I prefixed section names with .new
using objcopy --prefix-sections=.new example-embedobj
to avoid section name clashes.) (请注意,我前缀部分名称以.new
使用objcopy --prefix-sections=.new example-embedobj
避免部分名称冲突。)
I then wrote some code to generate a new executable (borrowed some code both from objcopy
and Security Warrior
book). 然后,我写了一些代码来生成一个新的可执行文件(从objcopy
和Security Warrior
书中借用了一些代码)。 The new executable should have: 新的可执行文件应具有:
example-target
and all the sections of example-embed
example-target
所有部分和example-embed
所有部分 example-target
and all the symbols of example-embed
一个符号表,其中包含来自example-target
所有符号和example-embed
所有符号 The code I wrote is: 我写的代码是:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <bfd.h>
#include <libiberty.h>
struct COPYSECTION_DATA {
bfd * obfd;
asymbol ** syms;
int symsize;
int symcount;
};
void copy_section(bfd * ibfd, asection * section, PTR data)
{
struct COPYSECTION_DATA * csd = data;
bfd * obfd = csd->obfd;
asection * s;
long size, count, sz_reloc;
if((bfd_get_section_flags(ibfd, section) & SEC_GROUP) != 0) {
return;
}
/* get output section from input section struct */
s = section->output_section;
/* get sizes for copy */
size = bfd_get_section_size(section);
sz_reloc = bfd_get_reloc_upper_bound(ibfd, section);
if(!sz_reloc) {
/* no relocations */
bfd_set_reloc(obfd, s, NULL, 0);
} else if(sz_reloc > 0) {
arelent ** buf;
/* build relocations */
buf = xmalloc(sz_reloc);
count = bfd_canonicalize_reloc(ibfd, section, buf, csd->syms);
/* set relocations for the output section */
bfd_set_reloc(obfd, s, count ? buf : NULL, count);
free(buf);
}
/* get input section contents, set output section contents */
if(section->flags & SEC_HAS_CONTENTS) {
bfd_byte * memhunk = NULL;
bfd_get_full_section_contents(ibfd, section, &memhunk);
bfd_set_section_contents(obfd, s, memhunk, 0, size);
free(memhunk);
}
}
void define_section(bfd * ibfd, asection * section, PTR data)
{
bfd * obfd = data;
asection * s = bfd_make_section_anyway_with_flags(obfd,
section->name, bfd_get_section_flags(ibfd, section));
/* set size to same as ibfd section */
bfd_set_section_size(obfd, s, bfd_section_size(ibfd, section));
/* set vma */
bfd_set_section_vma(obfd, s, bfd_section_vma(ibfd, section));
/* set load address */
s->lma = section->lma;
/* set alignment -- the power 2 will be raised to */
bfd_set_section_alignment(obfd, s,
bfd_section_alignment(ibfd, section));
s->alignment_power = section->alignment_power;
/* link the output section to the input section */
section->output_section = s;
section->output_offset = 0;
/* copy merge entity size */
s->entsize = section->entsize;
/* copy private BFD data from ibfd section to obfd section */
bfd_copy_private_section_data(ibfd, section, obfd, s);
}
void merge_symtable(bfd * ibfd, bfd * embedbfd, bfd * obfd,
struct COPYSECTION_DATA * csd)
{
/* set obfd */
csd->obfd = obfd;
/* get required size for both symbol tables and allocate memory */
csd->symsize = bfd_get_symtab_upper_bound(ibfd) /********+
bfd_get_symtab_upper_bound(embedbfd) */;
csd->syms = xmalloc(csd->symsize);
csd->symcount = bfd_canonicalize_symtab (ibfd, csd->syms);
/******** csd->symcount += bfd_canonicalize_symtab (embedbfd,
csd->syms + csd->symcount); */
/* copy merged symbol table to obfd */
bfd_set_symtab(obfd, csd->syms, csd->symcount);
}
bool merge_object(bfd * ibfd, bfd * embedbfd, bfd * obfd)
{
struct COPYSECTION_DATA csd = {0};
if(!ibfd || !embedbfd || !obfd) {
return FALSE;
}
/* set output parameters to ibfd settings */
bfd_set_format(obfd, bfd_get_format(ibfd));
bfd_set_arch_mach(obfd, bfd_get_arch(ibfd), bfd_get_mach(ibfd));
bfd_set_file_flags(obfd, bfd_get_file_flags(ibfd) &
bfd_applicable_file_flags(obfd));
/* set the entry point of obfd */
bfd_set_start_address(obfd, bfd_get_start_address(ibfd));
/* define sections for output file */
bfd_map_over_sections(ibfd, define_section, obfd);
/******** bfd_map_over_sections(embedbfd, define_section, obfd); */
/* merge private data into obfd */
bfd_merge_private_bfd_data(ibfd, obfd);
/******** bfd_merge_private_bfd_data(embedbfd, obfd); */
merge_symtable(ibfd, embedbfd, obfd, &csd);
bfd_map_over_sections(ibfd, copy_section, &csd);
/******** bfd_map_over_sections(embedbfd, copy_section, &csd); */
free(csd.syms);
return TRUE;
}
int main(int argc, char **argv)
{
bfd * ibfd;
bfd * embedbfd;
bfd * obfd;
if(argc != 4) {
perror("Usage: infile embedfile outfile\n");
xexit(-1);
}
bfd_init();
ibfd = bfd_openr(argv[1], NULL);
embedbfd = bfd_openr(argv[2], NULL);
if(ibfd == NULL || embedbfd == NULL) {
perror("asdfasdf");
xexit(-1);
}
if(!bfd_check_format(ibfd, bfd_object) ||
!bfd_check_format(embedbfd, bfd_object)) {
perror("File format error");
xexit(-1);
}
obfd = bfd_openw(argv[3], NULL);
bfd_set_format(obfd, bfd_object);
if(!(merge_object(ibfd, embedbfd, obfd))) {
perror("Error merging input/obj");
xexit(-1);
}
bfd_close(ibfd);
bfd_close(embedbfd);
bfd_close(obfd);
return EXIT_SUCCESS;
}
To summarise what this code does, it takes 2 input files ( ibfd
and embedbfd
) to generate an output file ( obfd
). 总结一下此代码的作用,需要两个输入文件( ibfd
和embedbfd
)来生成输出文件( obfd
)。
ibfd
to obfd
将格式/ arch / mach /文件标志和起始地址从ibfd
到obfd
ibfd
and embedbfd
to obfd
. 定义从ibfd
和embedbfd
到obfd
。 Population of the sections happens separately because BFD mandates that all sections are created before any start to be populated. 这些部分的填充是分开进行的,因为BFD要求在开始任何填充之前创建所有部分。 ibfd
and embedbfd
and set this as the symbol table of obfd
. 创建一个由ibfd
和embedbfd
符号表组成的组合符号表,并将其设置为obfd
的符号表。 This symbol table is saved so it can later be used to build relocation information. 此符号表已保存,因此以后可用于构建重定位信息。 ibfd
to obfd
. 将这些部分从ibfd
复制到obfd
。 As well as copying the section contents, this step also deals with building and setting the relocation table. 除了复制节内容外,此步骤还涉及构建和设置重定位表。 In the code above, some lines are commented out with /******** */
. 在上面的代码中,某些行用/******** */
注释掉。 These lines deal with the merging of example-embed
. 这些行涉及example-embed
的合并。 If they are commented out, what happens is that obfd
is simply built as a copy of ibfd
. 如果将其注释掉, obfd
发生的情况是obfd
只是作为ibfd
的副本而ibfd
。 I have tested this and it works fine. 我已经对此进行了测试,并且效果很好。 However, once I comment these lines back in the problems start occurring. 但是,一旦我在评论这些行时就开始出现问题。
With the uncommented version which does the full merge, it still generates an output file. 对于进行完全合并的未注释版本,它仍然会生成输出文件。 This output file can be inspected with objdump
and found to have all the sections, code and symbol tables of both inputs. 可以使用objdump
检查此输出文件,并发现它们具有两个输入的所有节,代码和符号表。 However, objdump
complains with: 但是, objdump
抱怨:
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
On my system, 1708
of elf.c
is: 在我的系统上, elf.c
1708
是:
BFD_ASSERT (elf_dynsymtab (abfd) == 0);
elf_dynsymtab
is a macro in elf-bfd.h
for: elf_dynsymtab
是elf-bfd.h
的宏,用于:
#define elf_dynsymtab(bfd) (elf_tdata(bfd) -> dynsymtab_section)
I'm not familiar with the ELF layer, but I believe this is a problem reading the dynamic symbol table (or perhaps saying it's not present). 我对ELF层不熟悉,但是我认为读取动态符号表是一个问题(或者说它不存在)。 For the time, I am trying to avoid having to reach down directly into the ELF layer unless necessary. 暂时,除非有必要,我试图避免直接进入ELF层。 Is anyone able to tell me what I'm doing wrong either in my code or conceptually? 有谁能告诉我我在代码中或概念上做错了什么?
If it is helpful, I can also post the code for the linker command generation or compiled versions of the example binaries. 如果有帮助,我还可以发布用于链接程序命令生成的代码或示例二进制文件的编译版本。
I realise that this is a very large question and for this reason, I would like to properly reward anyone who is able to help me with it. 我意识到这是一个很大的问题,因此,我想对任何能够帮助我的人给予适当的奖励。 If I am able to solve this with the help of someone, I am happy to award a 500+ bonus. 如果我能够在某人的帮助下解决此问题,我很乐意奖励500多个奖金。
Why do all of this manually? 为什么要手动进行所有这些操作? Given that you have all symbol information (which you must if you want to edit the binary in a sane way), wouldn't it be easier to SPLIT the executable into separate object files (say, one object file per function), do your editing, and relink it? 既然您拥有所有符号信息(如果要以理性的方式编辑二进制文件,则必须具有),将可执行文件拆分为单独的目标文件(例如,每个功能一个目标文件)不是更容易吗?编辑,然后重新链接?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.