简体   繁体   English

获取可执行文件中文本部分的起始地址和结束地址

[英]Get the start and end address of text section in an executable

I need to get the start and end address of an executable's text section.我需要获取可执行文件文本部分的开始和结束地址。 How can I get it?我怎么才能得到它?

I can get the starting address from the _init symbol or the _start symbol, but what about the ending address?我可以从_init符号或_start符号中获取起始地址,但是结束地址呢? Shall I consider the ending address of the text section to be the last address before starting of the .rodata section?我是否应该将text部分的结束地址视为.rodata部分开始之前的最后一个地址?

Or shall I edit the default ld script and add my own symbols to indicate the start and end of the text section, and pass it to GCC when compiling?或者我应该编辑默认的 ld 脚本并添加我自己的符号来指示文本部分的开始和结束,并在编译时将其传递给 GCC? In this case, where shall I place the new symbols, shall I consider the init and fini section?在这种情况下,我应该在哪里放置新符号,我应该考虑 init 和 fini 部分吗?

What is a good way to get the start and end address of the text section?获取文本部分的开始和结束地址的好方法是什么?

The GNU binutils default linker scripts for ELF-based platforms normally define quite a number of different symbols which can be used to find the start and end of various sections.基于 ELF 的平台的 GNU binutils 默认链接器脚本通常定义大量不同的符号,可用于查找各个部分的开始和结束。

The end of the text section is usually referenced by a choice of three different symbols: etext , _etext or __etext ;文本部分的结尾通常由三种不同的符号来引用: etext_etext__etext the start can be found as __executable_start .开始可以找到为__executable_start (Note that these symbols are usually exported using the PROVIDE() mechanism, which means that they will be overridden if something else in your executable defines them rather than merely referencing them. In particular that means that _etext or __etext are likely to be safer choices than etext .) (请注意,这些符号通常使用PROVIDE()机制导出,这意味着如果可执行文件中的其他内容定义了它们而不是仅仅引用它们,它们将被覆盖。特别是这意味着_etext__etext可能是更安全的选择比电子etext 。)

Example:例子:

$ cat etext.c
#include <stdio.h>

extern char __executable_start;
extern char __etext;

int main(void)
{
  printf("0x%lx\n", (unsigned long)&__executable_start);
  printf("0x%lx\n", (unsigned long)&__etext);
  return 0;
}
$ gcc -Wall -o etext etext.c
$ ./etext
0x8048000
0x80484a0
$

I don't believe that any of these symbols are specified by any standard, so this shouldn't be assumed to be portable (I have no idea whether even GNU binutils provides them for all ELF-based platforms, or whether the set of symbols provided has changed over different binutils versions), although I guess if a) you are doing something that needs this information, and b) you're considering hacked linker scripts as an option, then portability isn't too much of a concern!我不相信这些符号中的任何一个是由任何标准指定的,所以不应该假设它是可移植的(我不知道甚至 GNU binutils 是否为所有基于 ELF 的平台提供它们,或者符号集是否提供已更改为不同的 binutils 版本),尽管我猜如果 a)您正在做一些需要这些信息的事情,并且 b)您正在考虑将被黑的链接器脚本作为一种选择,那么可移植性就不是什么大问题了!

To see the exact set of symbols you get when building a particular thing on a particular platform, give the --verbose flag to ld (or -Wl,--verbose to gcc ) to print the linker script it chooses to use (there are really several different default linker scripts, which vary according to linker options and the type of object you're building).要查看在特定平台上构建特定事物时获得的确切符号集,请将--verbose标志提供给ld (或-Wl,--verbosegcc )以打印它选择使用的链接器脚本(有实际上有几种不同的默认链接器脚本,它们根据链接器选项和您正在构建的对象类型而有所不同)。

It's incorrect to speak of "the" text segment, since there may be more than one (guaranteed for the usual case when you have shared libraries, but it's still possible for a single ELF binary to have multiple PT_LOAD sections with the same flags anyway).说“the”文本段是不正确的,因为可能有多个(当您拥有共享库时,通常情况下可以保证,但单个 ELF 二进制文件仍然有可能具有多个具有相同标志的PT_LOAD部分) .

The following sample program dumps out all the information returned by dl_iterate_phr .以下示例程序转储了dl_iterate_phr返回的所有信息。 You're interested in any segment of type PT_LOAD with the PF_X flag (note that PT_GNU_STACK will include the flag if -z execstack is passed to the linker, so you really do have to check both).你感兴趣的类型的任何部分PT_LOADPF_X标志(注意PT_GNU_STACK将包括标志,如果-z execstack被传递给链接,所以你真的要检查这两个)。

#define _GNU_SOURCE
#include <link.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

const char *type_str(ElfW(Word) type)
{
    switch (type)
    {
    case PT_NULL:
        return "PT_NULL"; // should not be seen at runtime, only in the file!
    case PT_LOAD:
        return "PT_LOAD";
    case PT_DYNAMIC:
        return "PT_DYNAMIC";
    case PT_INTERP:
        return "PT_INTERP";
    case PT_NOTE:
        return "PT_NOTE";
    case PT_SHLIB:
        return "PT_SHLIB";
    case PT_PHDR:
        return "PT_PHDR";
    case PT_TLS:
        return "PT_TLS";
    case PT_GNU_EH_FRAME:
        return "PT_GNU_EH_FRAME";
    case PT_GNU_STACK:
        return "PT_GNU_STACK";
    case PT_GNU_RELRO:
        return "PT_GNU_RELRO";
    case PT_SUNWBSS:
        return "PT_SUNWBSS";
    case PT_SUNWSTACK:
        return "PT_SUNWSTACK";
    default:
        if (PT_LOOS <= type && type <= PT_HIOS)
        {
            return "Unknown OS-specific";
        }
        if (PT_LOPROC <= type && type <= PT_HIPROC)
        {
            return "Unknown processor-specific";
        }
        return "Unknown";
    }
}

const char *flags_str(ElfW(Word) flags)
{
    switch (flags & (PF_R | PF_W | PF_X))
    {
    case 0 | 0 | 0:
        return "none";
    case 0 | 0 | PF_X:
        return "x";
    case 0 | PF_W | 0:
        return "w";
    case 0 | PF_W | PF_X:
        return "wx";
    case PF_R | 0 | 0:
        return "r";
    case PF_R | 0 | PF_X:
        return "rx";
    case PF_R | PF_W | 0:
        return "rw";
    case PF_R | PF_W | PF_X:
        return "rwx";
    }
    __builtin_unreachable();
}

static int callback(struct dl_phdr_info *info, size_t size, void *data)
{
    int j;
    (void)data;

    printf("object \"%s\"\n", info->dlpi_name);
    printf("  base address: %p\n", (void *)info->dlpi_addr);
    if (size > offsetof(struct dl_phdr_info, dlpi_adds))
    {
        printf("  adds: %lld\n", info->dlpi_adds);
    }
    if (size > offsetof(struct dl_phdr_info, dlpi_subs))
    {
        printf("  subs: %lld\n", info->dlpi_subs);
    }
    if (size > offsetof(struct dl_phdr_info, dlpi_tls_modid))
    {
        printf("  tls modid: %zu\n", info->dlpi_tls_modid);
    }
    if (size > offsetof(struct dl_phdr_info, dlpi_tls_data))
    {
        printf("  tls data: %p\n", info->dlpi_tls_data);
    }
    printf("  segments: %d\n", info->dlpi_phnum);

    for (j = 0; j < info->dlpi_phnum; j++)
    {
        const ElfW(Phdr) *hdr = &info->dlpi_phdr[j];
        printf("    segment %2d\n", j);
        printf("      type: 0x%08X (%s)\n", hdr->p_type, type_str(hdr->p_type));
        printf("      file offset: 0x%08zX\n", hdr->p_offset);
        printf("      virtual addr: %p\n", (void *)hdr->p_vaddr);
        printf("      physical addr: %p\n", (void *)hdr->p_paddr);
        printf("      file size: 0x%08zX\n", hdr->p_filesz);
        printf("      memory size: 0x%08zX\n", hdr->p_memsz);
        printf("      flags: 0x%08X (%s)\n", hdr->p_flags, flags_str(hdr->p_flags));
        printf("      align: %zd\n", hdr->p_align);
        if (hdr->p_memsz)
        {
            printf("      derived address range: %p to %p\n",
                (void *) (info->dlpi_addr + hdr->p_vaddr),
                (void *) (info->dlpi_addr + hdr->p_vaddr + hdr->p_memsz));
        }
    }
    return 0;
}

int main(void)
{
    dl_iterate_phdr(callback, NULL);

    exit(EXIT_SUCCESS);
}

For Linux, consider using nm(1) tool to inspect what symbols the object file provides.对于 Linux,请考虑使用nm(1)工具检查目标文件提供的符号。 You can pick through this set of symbols, where you could learn both of the symbols that Matthew Slattery provided in his answer.您可以选择这组符号,从中可以了解 Matthew Slattery 在其答案中提供的两个符号。

.rodata is not guaranteed to always come directly after .text . .rodata不能保证总是.rodata.text之后。 You can use objdump -h file and readelf --sections file to get more info.您可以使用objdump -h filereadelf --sections file来获取更多信息。 With objdump you get both size and offset into file.使用 objdump,您可以获得文件的大小和偏移量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM