简体   繁体   English

使用Linux标头中的unistd.h构建不带libc的静态ELF

[英]Build static ELF without libc using unistd.h from Linux headers

I'm interested in building a static ELF program without (g)libc, using unistd.h provided by the Linux headers. 我有兴趣使用Linux标头提供的unistd.h构建不带(g)libc的静态ELF程序。

I've read through these articles/question which give a rough idea of what I'm trying to do, but not quite: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html 我已经阅读了这些文章/问题,这些文章/问题给出了我要尝试做的一个大概的想法,但并不太清楚: http : //www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Compiling without libc 不使用libc进行编译

https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free

I have basic code which depends only on unistd.h, of which, my understanding is that each of those functions are provided by the kernel, and that libc should not be needed. 我有只依赖于unistd.h的基本代码,我的理解是,每个功能都是由内核提供的,不需要libc。 Here's the path I've taken that seems the most promising: 这是我最有前途的选择:

    $ gcc -I /usr/include/asm/ -nostdlib grabbytes.c -o grabbytesstatic
    /usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144
    /tmp/ccn1mSkn.o: In function `main':
    grabbytes.c:(.text+0x38): undefined reference to `open'
    grabbytes.c:(.text+0x64): undefined reference to `lseek'
    grabbytes.c:(.text+0x8f): undefined reference to `lseek'
    grabbytes.c:(.text+0xaa): undefined reference to `read'
    grabbytes.c:(.text+0xc5): undefined reference to `write'
    grabbytes.c:(.text+0xe0): undefined reference to `read'
    collect2: error: ld returned 1 exit status

Before this, I had to manually define SEEK_END and SEEK_SET according to the values found in the kernel headers. 在此之前,我必须根据内核头文件中的值手动定义SEEK_END和SEEK_SET。 Else it would error saying that those were not defined, which makes sense. 否则,错误地指出未定义它们,这是有道理的。

I imagine that I need to link into an unstripped vmlinux to provide the symbols to utilize. 我想我需要链接到未剥离的vmlinux中以提供要使用的符号。 However, I read through the symbols and while there were plenty of llseeks, they were not llseek verbatim. 但是,我通读了这些符号,尽管有很多llseeks,但它们并不是逐字记录的。

So my question can go in a few directions: 所以我的问题可以向几个方向发展:

How can I specify an ELF file to utilize symbols from? 如何指定ELF文件来利用符号? And I'm guessing if/how that's possible, the symbols won't match up. 我正在猜测是否/如何可能,这些符号将不匹配。 If this is correct, is there an existing header file which will redefine llseek and default_llseek or whatever is exactly in the kernel? 如果正确的话,是否存在一个现有的头文件,它将重新定义llseek和default_llseek或内核中完全相同的东西?

Is there a better way to write Posix code in C without a libc? 没有libc,有没有更好的方法用C编写Posix代码?

My goal is to write or port fairly standard C code using (perhaps solely) unistd.h and invoke it without libc. 我的目标是使用unistd.h(也许仅)编写或移植相当标准的C代码,并在没有libc的情况下调用它。 I'm probably okay without a few unistd functions, and am not sure which ones exist "purely" as kernel calls or not. 如果没有一些unisted函数,我可能还可以,并且不确定哪些内核可以“纯粹地”存在。 I love assembly, but that's not my goal here. 我喜欢组装,但这不是我的目标。 Hoping to stay as strictly C as possible (I'm fine with a few external assembly files if I have to), to allow for a libc-less static system at some point. 希望尽可能严格地保留C(如果需要的话,我可以使用一些外部汇编文件),以便在某个时候允许使用无libc的静态系统。

Thank you for reading! 感谢您的阅读!

If you're looking to write POSIX code in C, the abandonment of libc is not going to be helpful. 如果您希望用C编写POSIX代码,那么放弃libc将无济于事。 Although you could implement a syscall function in assembler, and copy structures and defines from the kernel header, you would essentially be writing your own libc, which almost certainly would not be POSIX compliant. 尽管您可以在汇编器中实现syscall函数,并从内核头文件复制结构和定义,但实际上您将在编写自己的libc,几乎可以肯定它不符合POSIX。 With all the great libc implementations out there, there's almost no reason to begin implementing your own. 有了所有出色的libc实现,几乎没有理由开始实现自己的实现。

dietlibc and musl libc are both frugal libc implementations which yield impressively small binaries The linker is generally smart; Dietlibcmusl libc都是节俭的libc实现,它们产生的二进制文件非常小。 as long as a library is written to avoid the accidentally pulling in numerous dependencies, only the functions you use will actually be linked into your program. 只要编写了一个库来避免意外引入大量依赖关系,实际上只有您使用的功能才会链接到您的程序中。

Here is a simple hello world program: 这是一个简单的hello world程序:

#include<unistd.h>

int main(){
    char str[] = "Hello, World!\n";
    write(1, str, sizeof str - 1);
    return 0;
}

Compiling it with musl below yeilds a binary of a less than 3K 使用下面的musl编译它会生成小于3K的二进制文件

$ musl-gcc -Os -static hello.c
$ strip a.out 
$ wc -c a.out
2800 a.out

dietlibc produces an even smaller binary, less than 1.5K: Dietlibc生成的二进制文件甚至更小,小于1.5K:

$ diet -Os gcc hello.c
$ strip a.out 
$ wc -c a.out
1360 a.out

This is far from ideal, but a little bit of (x86_64) assembler has me down to just under 5KB (but most of that is "other things than code" - the actual code is under 1KB [771 bytes to be precise], but the file size is much larger, I think because the code size is rounded to 4KB, and then some header/footer/extra stuff is added to that] 这远非理想,但是(x86_64)汇编程序的一小部分使我不足5KB(但其中大多数是“代码以外的其他东西”)-实际代码在1KB以下(精确到771个字节),但是文件大小要大得多,我认为是因为代码大小四舍五入为4KB,然后在其中添加了一些页眉/页脚/其他内容]

Here's what I did: gcc -g -static -nostdlib -o glibc start.s glibc.c -Os -lc 这是我所做的:gcc -g -static -nostdlib -o glibc start.s glibc.c -Os -lc

glibc.c contains: glibc.c包含:

#include <unistd.h>

int main()
{
    const char str[] = "Hello, World!\n";
    write(1, str, sizeof(str));

    _exit(0);
}

start.s contains: start.s包含:

    .globl _start
_start: 
    xor %ebp, %ebp
    mov %rdx, %r9
    mov %rsp, %rdx
    and $~16, %rsp
    push    $0
    push    %rsp

    call    main

    hlt


    .globl _exit
_exit:
    //  We known %RDI already has the exit code... 
    mov $0x3c, %eax
    syscall
    hlt

That main point of this is not to show that it's not the system call part of glibc that takes up a lot of space, but the "prepar things" - and beware that if you were to call for example printf, possibly even (v)sprintf, or exit(), or any other "standard library" function, you are in the land of "nobody knows what will happen". 这样做的主要目的不是要表明不是glibc的系统调用部分占用了大量空间,而是“准备事情”-请注意,如果要调用例如printf,甚至可能是(v) sprintf或exit()或任何其他“标准库”函数,您将处于“没人知道会发生什么”的境地。

Edit: Updated "start.s" to put argc/argv in the right places: 编辑:更新了“ start.s”以将argc / argv放在正确的位置:

_start: 
    xor %ebp, %ebp
    mov %rdx, %r9
    pop     %rdi
    mov %rsp, %rsi
    and $~16, %rsp
    push    %rax
    push    %rsp

    // %rdi = argc, %rsi=argv
    call    main

Note that I've changed which register contains what thing, so that it matches main - I had them slightly wrong order in the previous code. 请注意,我已经更改了哪个寄存器包含什么东西,以便它与main匹配-在上一代码中,它们的顺序有些错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM