简体   繁体   English

Clang 11 和 GCC 8 O2 中断内联组装

[英]Clang 11 and GCC 8 O2 Breaks Inline Assembly

I have a short snippet of code, with some inline assembly that prints argv[0] properly in O0, but does not print anything in O2 (when using Clang. GCC, on the other hand, prints the string stored in envp[0] when printing argv[0]).我有一小段代码,其中有一些内联程序集可以在 O0 中正确打印 argv[0],但不会在 O2 中打印任何内容(使用 Clang 时。另一方面,GCC 打印存储在 envp[0] 中的字符串打印 argv[0] 时)。 This problem is also restricted to only argv (the other two function parameters can be used as expected with or without optimizations enabled).这个问题也仅限于 argv(其他两个函数参数可以在启用或不启用优化的情况下按预期使用)。 I tested this with both GCC and Clang, and both compilers have this issue.我用 GCC 和 Clang 对此进行了测试,两个编译器都有这个问题。

Here is the code:这是代码:

void exit(unsigned long long status) {
    asm volatile("movq $60, %%rax;" //system call 60 is exit
        "movq %0, %%rdi;" //return code 0
        "syscall"
        : //no outputs
        :"r"(status)
        :"rax", "rdi");
}

int open(const char *pathname, unsigned long long flags) {
    asm volatile("movq $2, %%rax;" //system call 2 is open
        "movq %0, %%rdi;"
        "movq %1, %%rsi;"
        "syscall"
        : //no outputs
        :"r"(pathname), "r"(flags)
        :"rax", "rdi", "rsi");
        return 1;
}

int write(unsigned long long fd, const void *buf, size_t count) {
    asm volatile("movq $1, %%rax;" //system call 1 is write
        "movq %0, %%rdi;"
        "movq %1, %%rsi;"
        "movq %2, %%rdx;"
        "syscall"
        : //no outputs
        :"r"(fd), "r"(buf), "r"(count)
        :"rax", "rdi", "rsi", "rdx");
        return 1;
}

static void entry(unsigned long long argc, char** argv, char** envp);

/*https://www.systutorials.com/x86-64-calling-convention-by-gcc/: "The calling convention of the System V AMD64 ABI is followed on GNU/Linux. The registers RDI, RSI, RDX, RCX, R8, and R9 are used for integer and memory address arguments
and XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 are used for floating point arguments.
For system calls, R10 is used instead of RCX. Additional arguments are passed on the stack and the return value is stored in RAX."*/

//__attribute__((naked)) defines a pure-assembly function
__attribute__((naked)) void _start() {
    asm volatile("xor %%rbp,%%rbp;" //http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html: "%ebp,%ebp sets %ebp to zero. This is suggested by the ABI (Application Binary Interface specification), to mark the outermost frame."
    "pop %%rdi;" //rdi: arg1: argc -- can be popped off the stack because it is copied onto register
    "mov %%rsp, %%rsi;" //rsi: arg2: argv
    "mov %%rdi, %%rdx;"
    "shl $3, %%rdx;" //each argv pointer takes up 8 bytes (so multiply argc by 8)
    "add $8, %%rdx;" //add size of null word at end of argv-pointer array (8 bytes)
    "add %%rsp, %%rdx;" //rdx: arg3: envp
    "andq $-16, %%rsp;" //align stack to 16-bits (which is required on x86-64)
    "jmp %P0" //https://stackoverflow.com/questions/3467180/direct-c-function-call-using-gccs-inline-assembly: "After looking at the GCC source code, it's not exactly clear what the code P in front of a constraint means. But, among other things, it prevents GCC from putting a $ in front of constant values. Which is exactly what I need in this case."
    :
    :"i"(entry)
    :"rdi", "rsp", "rsi", "rdx", "rbp", "memory");
}

//Function cannot be optimized-away, since it is passed-in as an argument to asm-block above
//Compiler Options: -fno-asynchronous-unwind-tables;-O2;-Wall;-nostdlibinc;-nobuiltininc;-fno-builtin;-nostdlib; -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++
//Linker Options: -nostdlib; -nodefaultlibs
static void entry(unsigned long long argc, char** argv, char** envp) {
    int ttyfd = open("/dev/tty", O_WRONLY);

    write(ttyfd, argv[0], 9);
    write(ttyfd, "\n", 1);

    exit(0);
}

Edit: Added syscall definitions.编辑:添加了系统调用定义。

Edit: Adding rcx and r11 to the clobber list for the syscalls fixed the issue for clang, but gcc to have the error.编辑:将 rcx 和 r11 添加到系统调用的 clobber 列表中修复了 clang 的问题,但 gcc 出现错误。

Edit: GCC actually was not having an error, but some kind of strange error in my build system (CodeLite) made it so that the program ran some kind of partially-built program, even though GCC reported errors about it not recognizing two of the compiler flags passed-in.编辑:GCC 实际上没有错误,但是我的构建系统 (CodeLite) 中出现了某种奇怪的错误,因此该程序运行了某种部分构建的程序,尽管 GCC 报告了有关它无法识别其中两个的错误传入的编译器标志。 For GCC, use these flags instead: -fomit-frame-pointer;-fno-asynchronous-unwind-tables;-O2;-Wall;-nostdinc;-fno-builtin;-nostdlib;对于 GCC,请改用这些标志:-fomit-frame-pointer;-fno-asynchronous-unwind-tables;-O2;-Wall;-nostdinc;-fno-builtin;-nostdlib; -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++. -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++。 You can also use these flags for Clang, due to Clang's support for the above GCC options.由于 Clang 支持上述 GCC 选项,您也可以将这些标志用于 Clang。

  1. You can't use extended asm in a naked function, only basic asm, according to the gcc manual .根据gcc 手册,您不能在naked函数中使用扩展 asm,只能使用基本 asm。 You don't need to inform the compiler of clobbered registers (since it won't do anything about them anyway; in a naked function you are responsible for all register management).您不需要将损坏的寄存器通知编译器(因为它无论如何都不会对它们做任何事情;在naked函数中,您负责所有寄存器管理)。 And passing the address of entry in an extended operand is unnecessary;并且不需要在扩展操作数中传递entry地址; just do jmp entry .只需执行jmp entry

    (In my tests your code doesn't compile at all, so I assume you weren't showing us your exact code - next time please do, so as to avoid wasting people's time.) (在我的测试中,您的代码根本无法编译,所以我假设您没有向我们展示您的确切代码 - 下次请这样做,以免浪费人们的时间。)

  2. Linux x86-64 syscall system calls are allowed to clobber the rcx and r11 registers, so you need to add those to the clobber lists of your system calls. Linux x86-64 syscall系统调用允许破坏rcxr11寄存器,因此您需要将它们添加到系统调用的破坏列表中。

  3. You align the stack to a 16-byte boundary before jumping to entry .在跳转到entry之前,您将堆栈与 16 字节边界对齐。 However, the 16-byte alignment rule is based on the assumption that you will be calling the function with call , which would push an additional 8 bytes onto the stack.然而,16字节对齐规则是基于这样的假设,你将调用与函数call ,这将推动一个额外的8个字节到堆栈中。 As such, the called function actually expects the stack to initially be, not a multiple of 16, but 8 more or less than a multiple of 16. So you are actually aligning the stack incorrectly, and this can be a cause of all sorts of mysterious trouble.因此,被调用的函数实际上期望堆栈最初不是 16 的倍数,而是比 16 的倍数多或少 8。因此您实际上是在错误地对齐堆栈,这可能是导致各种神秘的麻烦。

    So either replace your jmp with call , or else subtract a further 8 bytes from rsp (or just push some 64-bit register of your choice).因此,要么将您的jmp替换为call ,要么从rsp再减去 8 个字节(或者只是push您选择的一些 64 位寄存器)。

  4. Style note: unsigned long is already 64 bits on Linux x86-64, so it would be more idiomatic to use that in place of unsigned long long everywhere.样式说明: unsigned long在 Linux x86-64 上已经是 64 位,因此在任何地方使用它代替unsigned long long会更惯用。

  5. General hint: learn about register constraints in extended asm.一般提示:了解扩展 asm 中的寄存器约束。 You can have the compiler load your desired registers for you, instead of writing instructions in your asm to do it yourself.你可以让编译器为你加载你想要的寄存器,而不是在你的 asm 中编写指令来自己做。 So your exit function could instead look like:所以你的exit函数可能看起来像:

    void exit(unsigned long status) {
        asm volatile("syscall"
            : //no outputs
            :"a"(60), "D" (status)
            :"rcx", "r11");
    }

This in particular saves you a few instructions, since status is already in the %rdi register on function entry.这特别为您节省了一些指令,因为status已经在函数入口的%rdi寄存器中。 With your original code, the compiler has to move it somewhere else so that you can then load it into %rdi yourself.对于您的原始代码,编译器必须将它移到其他地方,以便您可以自己将其加载到%rdi

  1. Your open function always returns 1, which will typically not be the fd that was actually opened.您的open函数始终返回 1,这通常不是实际打开的 fd。 So if your program is run with standard output redirected, your program will write to the redirected stdout, instead of to the tty as it seems to want to do.因此,如果您的程序在标准输出重定向的情况下运行,您的程序将写入重定向的标准输出,而不是像它似乎想要做的那样写入 tty。 Indeed, this makes the open syscall completely pointless, because you never use the file you opened.事实上,这使得open系统调用完全没有意义,因为你从不使用你打开的文件。

    You should arrange for open to return the value that was actually returned by the system call, which will be left in the %rax register when syscall returns.您应该安排open返回系统调用实际返回的值,当syscall返回时,该值将保留在%rax寄存器中。 You can use an output operand to have this stored in a temporary variable (which the compiler will likely optimize out), and return that.您可以使用输出操作数将其存储在临时变量中(编译器可能会对其进行优化),然后返回该变量。 You'll need to use a digit constraint since it is going in the same register as an input operand.您需要使用数字约束,因为它与输入操作数位于同一寄存器中。 I leave this as an exercise for you.我把它留作你的练习。 It would likewise be nice if your write function actually returned the number of bytes written.如果您的write函数实际上返回了写入的字节数,那同样会很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM