I want to write my own ld.so and I want to do it step by step. I could not find any "guide" on how to code my ld.so, so I want to do it myself. I thought I would first try loading a simple binary in memory, like the one below; then call it. It's extremely simple, and it's already not working.
The binary is:
section .text
global _start
_start:
mov edi, 123
mov eax, 60
syscall
calling exit(123):
$ nasm -f elf64 bin.asm && ld bin.o && ./a.out; echo $?
$ 123
The loader:
FILE *fp = fopen(argv[1], "r");
if (!fp) {
fprintf(stderr, "cannot open file %s", argv[1]);
return 1;
}
fseek(fp, 0L, SEEK_END);
size_t sz = ftell(fp) + 1;
rewind(fp);
char *contents = malloc(sizeof(char) * sz);
size_t pagesize = getpagesize();
void *base_addr = (void*) (pagesize * (1 << 20));
char *region = mmap(
base_addr,
pagesize,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE,
0, 0
);
if (region == MAP_FAILED) {
fprintf(stderr, "could not mmap");
return 1;
}
for (int i = 1, nread = 0; nread != sz * sizeof(char) && i > 0; nread += i) {
i = fread(contents, sizeof(char), sz, fp);
}
contents[sz - 1] = 0;
if (ferror(fp)) {
fprintf(stderr, "error reading file %s", argv[1]);
return 1;
}
memcpy(region, contents, sz);
if (mprotect(region, pagesize, PROT_READ | PROT_EXEC)) {
fprintf(stderr, "mprotect failed");
return 1;
}
return ((int (*)()) base_addr)();
What I think will happen: my_linker -> binary in memory -> call mov edi, 123
, return 123.
What happens: "SIGSEGV at address 0x0"
I'm running this on Linux x86_64.
EDIT: in response to @Ctx. memcpy
instead of strncpy
.
I should have stated it cleared. I'm running nasm -f elf...
to show that it does what it is expected. As a program argument, nasm -f bin -o prog.bin ...
the binary file.
Two main issues:
Inappropriate use of strncpy()
Here, you use strncpy()
to copy the binary code into your mmap()
ped page:
strncpy(region, contents, sz);
But strncpy()
stops copying at the first zero byte, and there is probably quite early one in the binary. You have to use memcpy()
for this task!
Second issue:
The ELF format
You assume, that the code starts at the beginning of your binary. But here
$ nasm -f elf64 bin.asm && ld bin.asm && ./a.out; echo $?
you are linking it to an ELF format binary. So it starts with ELF headers, not with the code. There are essentially two possibilities: Either calculate the offset from the ELF headers, or use objcopy
to extract the pure code from the binary:
objcopy -O binary -j text a.out bin
Edit: You tried to use
nasm -f bin -o prog.bin bin.asm
but this produces 16-bit code by default. You have to explicitly state
bits 64
in your assembler source file to get 64-bit code.
Why use fread()/memcpy()
There is not much point to use fread()
into a buffer and memcpy()
afterwards, you could just directly mmap()
the binary into the memory without reading it.
char *region = mmap(
base_addr,
sz,
PROT_READ | PROT_EXEC,
MAP_PRIVATE | MAP_FIXED,
fileno(fp), 0
);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.