简体   繁体   中英

Dynamically linking a simple binary

I want to write my own ld.so and I want to do it step by step. I could not find any "guide" on how to code my ld.so, so I want to do it myself. I thought I would first try loading a simple binary in memory, like the one below; then call it. It's extremely simple, and it's already not working.

The binary is:

section .text
global _start

_start:
    mov edi, 123
    mov eax, 60
    syscall

calling exit(123):

$ nasm -f elf64 bin.asm && ld bin.o && ./a.out; echo $?
$ 123

The loader:

FILE *fp = fopen(argv[1], "r");
    if (!fp) {
        fprintf(stderr, "cannot open file %s", argv[1]);
        return 1;
    }

    fseek(fp, 0L, SEEK_END);
    size_t sz = ftell(fp) + 1;
    rewind(fp);

    char *contents = malloc(sizeof(char) * sz);
    size_t pagesize = getpagesize();
    void *base_addr = (void*) (pagesize * (1 << 20));

    char *region = mmap(
            base_addr,
            pagesize,
            PROT_READ | PROT_WRITE | PROT_EXEC,
            MAP_ANON | MAP_PRIVATE,
            0, 0
            );
    if (region == MAP_FAILED) {
        fprintf(stderr, "could not mmap");
        return 1;
    }

    for (int i = 1, nread = 0; nread != sz * sizeof(char) && i > 0; nread += i) {
        i = fread(contents, sizeof(char), sz, fp);
    }
    contents[sz - 1] = 0;
    if (ferror(fp)) {
        fprintf(stderr, "error reading file %s", argv[1]);
        return 1;
    }

    memcpy(region, contents, sz);
    if (mprotect(region, pagesize, PROT_READ | PROT_EXEC)) {
        fprintf(stderr, "mprotect failed");
        return 1;
    }

    return ((int (*)()) base_addr)();

What I think will happen: my_linker -> binary in memory -> call mov edi, 123 , return 123.

What happens: "SIGSEGV at address 0x0"

I'm running this on Linux x86_64.


EDIT: in response to @Ctx. memcpy instead of strncpy .

I should have stated it cleared. I'm running nasm -f elf... to show that it does what it is expected. As a program argument, nasm -f bin -o prog.bin ... the binary file.

Two main issues:

Inappropriate use of strncpy()

Here, you use strncpy() to copy the binary code into your mmap() ped page:

strncpy(region, contents, sz);

But strncpy() stops copying at the first zero byte, and there is probably quite early one in the binary. You have to use memcpy() for this task!

Second issue:

The ELF format

You assume, that the code starts at the beginning of your binary. But here

$ nasm -f elf64 bin.asm && ld bin.asm && ./a.out; echo $?

you are linking it to an ELF format binary. So it starts with ELF headers, not with the code. There are essentially two possibilities: Either calculate the offset from the ELF headers, or use objcopy to extract the pure code from the binary:

objcopy -O binary -j text a.out bin

Edit: You tried to use

nasm -f bin -o prog.bin bin.asm

but this produces 16-bit code by default. You have to explicitly state

bits 64

in your assembler source file to get 64-bit code.

Why use fread()/memcpy()

There is not much point to use fread() into a buffer and memcpy() afterwards, you could just directly mmap() the binary into the memory without reading it.

char *region = mmap(
        base_addr,
        sz,
        PROT_READ | PROT_EXEC,
        MAP_PRIVATE | MAP_FIXED,
        fileno(fp), 0
        );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM