如何在 GCC 内联汇编中使用标签？

Question

I'm trying to learn x86-64 inline assembly and decided to implement this very simple swap method that simply orders a and b in ascending order:我正在尝试学习 x86-64 内联汇编，并决定实现这个非常简单的交换方法，只需按升序对a和b进行排序：

#include <stdio.h>

void swap(int* a, int* b)
{
    asm(".intel_syntax noprefix");
    asm("mov    eax, DWORD PTR [rdi]");
    asm("mov    ebx, DWORD PTR [rsi]");
    asm("cmp    eax, ebx");
    asm("jle    .L1");
    asm("mov    DWORD PTR [rdi], ebx");
    asm("mov    DWORD PTR [rsi], eax");
    asm(".L1:");
    asm(".att_syntax noprefix");
}

int main()
{
    int input[3];

    scanf("%d%d%d", &input[0], &input[1], &input[2]);

    swap(&input[0], &input[1]);
    swap(&input[1], &input[2]);
    swap(&input[0], &input[1]);

    printf("%d %d %d\n", input[0], input[1], input[2]);

    return 0;
}

The above code works as expected when I run it with this command:当我使用以下命令运行时，上面的代码按预期工作：

> gcc main.c
> ./a.out
> 3 2 1
> 1 2 3

However, as soon as I turn optimazation on I get the following error messages:但是，一旦我打开优化，我就会收到以下错误消息：

> gcc -O2 main.c
> main.c: Assembler messages:
> main.c:12: Error: symbol `.L1' is already defined
> main.c:12: Error: symbol `.L1' is already defined
> main.c:12: Error: symbol `.L1' is already defined

If I've understood it correctly, this is because gcc tries to inline my swap function when optimization is turned on, causing the label .L1 to be defined multiple times in the assembly file.如果我理解正确的话，这是因为gcc在优化打开时尝试内联我的swap函数，导致标签.L1在程序集文件中被定义多次。

I've tried to find an answer to this problem, but nothing seems to work.我试图找到这个问题的答案，但似乎没有任何效果。 In this previusly asked question it's suggested to use local labels instead, and I've tried that aswell:在这个以前问过的问题中，建议改用本地标签，我也试过了：

#include <stdio.h>

void swap(int* a, int* b)
{
    asm(".intel_syntax noprefix");
    asm("mov    eax, DWORD PTR [rdi]");
    asm("mov    ebx, DWORD PTR [rsi]");
    asm("cmp    eax, ebx");
    asm("jle    1f");
    asm("mov    DWORD PTR [rdi], ebx");
    asm("mov    DWORD PTR [rsi], eax");
    asm("1:");
    asm(".att_syntax noprefix");
}

But when trying to run the program I now get a segmentation fault instead:但是当尝试运行程序时，我现在得到了一个分段错误：

> gcc -O2 main.c
> ./a.out
> 3 2 1
> Segmentation fault

I also tried the suggested solution to this previusly asked question and changed the name .L1 to CustomLabel1 in case there would be a name collision, but it still gives me the old error:我也尝试了建议的解决方案，这previusly被问到的问题，改变了名称.L1到CustomLabel1的情况下，将有一个名称冲突，但它仍然给我的老错误：

> gcc -O2 main.c
> main.c: Assembler messages:
> main.c:12: Error: symbol `CustomLabel1' is already defined
> main.c:12: Error: symbol `CustomLabel1' is already defined
> main.c:12: Error: symbol `CustomLabel1' is already defined

Finally I also tried this suggestion :最后我也尝试了这个建议：

void swap(int* a, int* b)
{
    asm(".intel_syntax noprefix");
    asm("mov    eax, DWORD PTR [rdi]");
    asm("mov    ebx, DWORD PTR [rsi]");
    asm("cmp    eax, ebx");
    asm("jle    label%=");
    asm("mov    DWORD PTR [rdi], ebx");
    asm("mov    DWORD PTR [rsi], eax");
    asm("label%=:");
    asm(".att_syntax noprefix");
}

But then I get these errors instead:但是后来我得到了这些错误：

main.c: Assembler messages:
main.c:9: Error: invalid character '=' in operand 1
main.c:12: Error: invalid character '%' in mnemonic
main.c:9: Error: invalid character '=' in operand 1
main.c:12: Error: invalid character '%' in mnemonic
main.c:9: Error: invalid character '=' in operand 1
main.c:12: Error: invalid character '%' in mnemonic
main.c:9: Error: invalid character '=' in operand 1
main.c:12: Error: invalid character '%' in mnemonic

So, my question is:所以，我的问题是：

How can I use labels in inline assembly?如何在内联汇编中使用标签？

This is the disassemble output for the optimized version:这是优化版本的反汇编输出：

> gcc -O2 -S main.c

    .file   "main.c"
    .section    .text.unlikely,"ax",@progbits
.LCOLDB0:
    .text
.LHOTB0:
    .p2align 4,,15
    .globl  swap
    .type   swap, @function
swap:
.LFB23:
    .cfi_startproc
#APP
# 5 "main.c" 1
    .intel_syntax noprefix
# 0 "" 2
# 6 "main.c" 1
    mov eax, DWORD PTR [rdi]
# 0 "" 2
# 7 "main.c" 1
    mov ebx, DWORD PTR [rsi]
# 0 "" 2
# 8 "main.c" 1
    cmp eax, ebx
# 0 "" 2
# 9 "main.c" 1
    jle 1f
# 0 "" 2
# 10 "main.c" 1
    mov DWORD PTR [rdi], ebx
# 0 "" 2
# 11 "main.c" 1
    mov DWORD PTR [rsi], eax
# 0 "" 2
# 12 "main.c" 1
    1:
# 0 "" 2
# 13 "main.c" 1
    .att_syntax noprefix
# 0 "" 2
#NO_APP
    ret
    .cfi_endproc
.LFE23:
    .size   swap, .-swap
    .section    .text.unlikely
.LCOLDE0:
    .text
.LHOTE0:
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC1:
    .string "%d%d%d"
.LC2:
    .string "%d %d %d\n"
    .section    .text.unlikely
.LCOLDB3:
    .section    .text.startup,"ax",@progbits
.LHOTB3:
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB24:
    .cfi_startproc
    subq    $40, %rsp
    .cfi_def_cfa_offset 48
    movl    $.LC1, %edi
    movq    %fs:40, %rax
    movq    %rax, 24(%rsp)
    xorl    %eax, %eax
    leaq    8(%rsp), %rcx
    leaq    4(%rsp), %rdx
    movq    %rsp, %rsi
    call    __isoc99_scanf
#APP
# 5 "main.c" 1
    .intel_syntax noprefix
# 0 "" 2
# 6 "main.c" 1
    mov eax, DWORD PTR [rdi]
# 0 "" 2
# 7 "main.c" 1
    mov ebx, DWORD PTR [rsi]
# 0 "" 2
# 8 "main.c" 1
    cmp eax, ebx
# 0 "" 2
# 9 "main.c" 1
    jle 1f
# 0 "" 2
# 10 "main.c" 1
    mov DWORD PTR [rdi], ebx
# 0 "" 2
# 11 "main.c" 1
    mov DWORD PTR [rsi], eax
# 0 "" 2
# 12 "main.c" 1
    1:
# 0 "" 2
# 13 "main.c" 1
    .att_syntax noprefix
# 0 "" 2
# 5 "main.c" 1
    .intel_syntax noprefix
# 0 "" 2
# 6 "main.c" 1
    mov eax, DWORD PTR [rdi]
# 0 "" 2
# 7 "main.c" 1
    mov ebx, DWORD PTR [rsi]
# 0 "" 2
# 8 "main.c" 1
    cmp eax, ebx
# 0 "" 2
# 9 "main.c" 1
    jle 1f
# 0 "" 2
# 10 "main.c" 1
    mov DWORD PTR [rdi], ebx
# 0 "" 2
# 11 "main.c" 1
    mov DWORD PTR [rsi], eax
# 0 "" 2
# 12 "main.c" 1
    1:
# 0 "" 2
# 13 "main.c" 1
    .att_syntax noprefix
# 0 "" 2
# 5 "main.c" 1
    .intel_syntax noprefix
# 0 "" 2
# 6 "main.c" 1
    mov eax, DWORD PTR [rdi]
# 0 "" 2
# 7 "main.c" 1
    mov ebx, DWORD PTR [rsi]
# 0 "" 2
# 8 "main.c" 1
    cmp eax, ebx
# 0 "" 2
# 9 "main.c" 1
    jle 1f
# 0 "" 2
# 10 "main.c" 1
    mov DWORD PTR [rdi], ebx
# 0 "" 2
# 11 "main.c" 1
    mov DWORD PTR [rsi], eax
# 0 "" 2
# 12 "main.c" 1
    1:
# 0 "" 2
# 13 "main.c" 1
    .att_syntax noprefix
# 0 "" 2
#NO_APP
    movl    8(%rsp), %r8d
    movl    4(%rsp), %ecx
    movl    $.LC2, %esi
    movl    (%rsp), %edx
    xorl    %eax, %eax
    movl    $1, %edi
    call    __printf_chk
    movq    24(%rsp), %rsi
    xorq    %fs:40, %rsi
    jne .L6
    xorl    %eax, %eax
    addq    $40, %rsp
    .cfi_remember_state
    .cfi_def_cfa_offset 8
    ret
.L6:
    .cfi_restore_state
    call    __stack_chk_fail
    .cfi_endproc
.LFE24:
    .size   main, .-main
    .section    .text.unlikely
.LCOLDE3:
    .section    .text.startup
.LHOTE3:
    .ident  "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609"
    .section    .note.GNU-stack,"",@progbits

Answer 1

There are plenty of tutorials - including this one (probably the best I know of), and some info on operand size modifiers .有很多教程 - 包括这个（可能是我所知道的最好的），以及一些关于操作数大小修饰符的信息。

Here's the first implementation - swap_2 :这是第一个实现 - swap_2 ：

void swap_2 (int *a, int *b)
{
    int tmp0, tmp1;

    __asm__ volatile (
        "movl (%0), %k2\n\t" /* %2 (tmp0) = (*a) */
        "movl (%1), %k3\n\t" /* %3 (tmp1) = (*b) */
        "cmpl %k3, %k2\n\t"
        "jle  %=f\n\t"       /* if (%2 <= %3) (at&t!) */
        "movl %k3, (%0)\n\t"
        "movl %k2, (%1)\n\t"
        "%=:\n\t"

        : "+r" (a), "+r" (b), "=r" (tmp0), "=r" (tmp1) :
        : "memory" /* "cc" */ );
}

A few notes :一些注意事项：

volatile (or __volatile__ ) is required, as the compiler only 'sees' (a) and (b) (and doesn't 'know' you're potentially exchanging their contents), and would otherwise be free to optimize the whole asm statement away - tmp0 and tmp1 would otherwise be considered unused variables too. volatile （或__volatile__ ）是必需的，因为编译器只“看到” (a)和(b) （并且不“知道”您可能正在交换它们的内容），否则可以自由优化整个asm语句离开 - tmp0和tmp1否则也会被视为未使用的变量。
"+r" means that this is both an input and output that may be modified; "+r"表示这是一个可以修改的输入和输出； only it isn't in this case, and they could strictly be input only - more on that in a bit...只是它不是在这种情况下，并且它们只能严格输入- 稍后会详细介绍......
The 'l' suffix on 'movl' isn't really necessary; 'movl' 上的 'l' 后缀并不是必需的； neither is the 'k' (32-bit) length modifier for the registers.寄存器的“k”（32 位）长度修饰符也不是。 Since you're using the Linux (ELF) ABI, an int is 32 bits for both IA32 and x86-64 ABIs.由于您使用的是 Linux (ELF) ABI，因此对于 IA32 和 x86-64 ABI， int都是 32 位。
The %= token generates a unique label for us. %=令牌为我们生成一个唯一的标签。 BTW, the jump syntax <label>f means a forward jump, and <label>b means back .顺便说一句，跳转语法<label>f表示向前跳转，而<label>b表示向后跳转。
For correctness, we need "memory" as the compiler has no way of knowing if values from dereferenced pointers have been changed.为了正确性，我们需要"memory"因为编译器无法知道来自解除引用的指针的值是否已更改。 This may be an issue in more complex inline asm surrounded by C code, as it invalidates all currently held values in memory - and is often a sledgehammer approach.在被 C 代码包围的更复杂的内联汇编中，这可能是一个问题，因为它会使内存中所有当前保存的值无效 - 并且通常是一种大锤方法。 Appearing at the end of a function in this fashion, it's not going to be an issue - but you can read more on it here (see: Clobbers )以这种方式出现在函数的末尾，这不会成为问题 - 但您可以在此处阅读更多相关信息（请参阅： Clobbers ）
The "cc" flags register clobber is detailed in the same section. "cc"标志寄存器 clobber 在同一部分中有详细说明。 on x86, it does nothing .在 x86 上，它什么也不做。 Some writers include it for clarity, but since practically all non-trivial asm statements affect the flags register, it's just assumed to be clobbered by default.一些作者为了清楚起见将它包含在内，但由于实际上所有非平凡的asm语句都会影响标志寄存器，因此它只是默认情况下被破坏。

Here's the C implementation - swap_1 :这是 C 实现 - swap_1 ：

void swap_1 (int *a, int *b)
{
    if (*a > *b)
    {
        int t = *a; *a = *b; *b = t;
    }
}

Compiling with gcc -O2 for x86-64 ELF, I get identical code.使用gcc -O2 for x86-64 ELF 编译，我得到相同的代码。 Just a bit of luck that the compiler chose tmp0 and tmp1 to use the same free registers for temps... cutting out the noise, like the .cfi directives, etc., gives:幸运的是，编译器选择tmp0和tmp1来使用相同的空闲寄存器作为临时寄存器......消除噪音，如 .cfi 指令等，给出：

swap_2:
        movl (%rdi), %eax
        movl (%rsi), %edx
        cmpl %edx, %eax
        jle  21f
        movl %edx, (%rdi)
        movl %eax, (%rsi)
        21:
        ret

As stated, the swap_1 code was identical, except that the compiler chose .L1 for its jump label.如前所述， swap_1代码是相同的，只是编译器选择了.L1作为其跳转标签。 Compiling the code with -m32 generated the same code (apart from using the tmp registers in a different order).使用-m32编译代码会生成相同的代码（除了以不同的顺序使用 tmp 寄存器）。 There's more overhead, as the IA32 ELF ABI passes parameters on the stack, whereas the x86-64 ABI passes the first two parameters in %rdi and %rsi respectively.开销更大，因为 IA32 ELF ABI 在堆栈上传递参数，而 x86-64 ABI 分别传递%rdi和%rsi的前两个参数。

Treating (a) and (b) as input only - swap_3 :仅将(a)和(b)视为输入 - swap_3 ：

void swap_3 (int *a, int *b)
{
    int tmp0, tmp1;

    __asm__ volatile (
        "mov (%[a]), %[x]\n\t" /* x = (*a) */
        "mov (%[b]), %[y]\n\t" /* y = (*b) */
        "cmp %[y], %[x]\n\t"
        "jle  %=f\n\t"         /* if (x <= y) (at&t!) */
        "mov %[y], (%[a])\n\t"
        "mov %[x], (%[b])\n\t"
        "%=:\n\t"

        : [x] "=&r" (tmp0), [y] "=&r" (tmp1)
        : [a] "r" (a), [b] "r" (b) : "memory" /* "cc" */ );
}

I've done away with the 'l' suffix and 'k' modifiers here, because they're not needed.我已经取消了此处的“l”后缀和“k”修饰符，因为它们不是必需的。 I've also used the 'symbolic name' syntax for operands, as it often helps to make the code more readable.我还对操作数使用了“符号名称”语法，因为它通常有助于使代码更具可读性。

(a) and (b) are now indeed input-only registers. (a)和(b)现在确实是仅输入寄存器。 So what's the "=&r" syntax mean?那么"=&r"语法是什么意思呢？ The & denotes an early clobber operand. &表示早期的clobber操作数。 In this case, the value may be written to before we finish using the input operands, and therefore the compiler must choose registers different from those selected for the input operands.在这种情况下，可能会在我们使用完输入操作数之前写入该值，因此编译器必须选择与为输入操作数选择的寄存器不同的寄存器。

Once again, the compiler generates identical code as it did for swap_1 and swap_2 .再一次，编译器生成与swap_1和swap_2相同的代码。

I wrote way more than I planned on this answer, but as you can see, it's very difficult to maintain awareness of all the information the compiler must be made aware of, as well as the idiosyncrasies of each instruction set (ISA) and ABI.我在这个答案上写的比我计划的要多，但正如你所看到的，很难保持对编译器必须知道的所有信息的认识，以及每个指令集 (ISA) 和 ABI 的特性。

Answer 2

You cannot just put a bunch of asm statements inline like that.你不能像这样将一堆asm语句放在内联。 The optimizer is free to re-order, duplicate, and drop them based on what constraints it knows.优化器可以根据它知道的约束自由地重新排序、复制和删除它们。 (In your case, it knows none.) （在你的情况下，它不知道。）

So firstly, you should consolidate the asm together, with proper read/write/clobber constraints.因此，首先，您应该使用适当的读/写/clobber 约束将 asm 合并在一起。 Secondly, there is a special asm goto form that gives assembly to C-level labels.其次，有一个特殊的asm goto形式，它可以将汇编到 C 级标签。

void swap(int *a, int *b) {
    int tmp1, tmp2;
    asm(
        "mov (%2), %0\n"
        "mov (%3), %1\n"
        : "=r" (tmp1), "=r" (tmp2)
        : "r" (a), "r" (b)
        : "memory"   // pointer in register doesn't imply that the pointed-to memory has to be "in sync"
        // or use "m" memory source operands to let the compiler pick the addressing mode
    );
    asm goto(
        "cmp %1, %0\n"
        "jle %l4\n"
        "mov %1, (%2)\n"
        "mov %0, (%3)\n"
        :
        : "r" (tmp1), "r" (tmp2), "r" (a), "r" (b)
        : "cc", "memory"
        : L1
    );
L1:
    return;
}

Answer 3

You cannot assume values are in any particular register in your asm code -- you need to use constraints to tell gcc what values you want to read and write and get it to tell you which register they are in. The gcc docs tell you most of what you need to know, but are pretty dense.你不能假设值在你的 asm 代码中的任何特定寄存器中——你需要使用约束来告诉 gcc 你想要读取和写入的值，并让它告诉你它们在哪个寄存器中。gcc 文档告诉你大部分你需要知道的，但非常密集。 There are also tutorials out there that you can easily find with a web search ( here or here )还有一些教程可以通过网络搜索轻松找到（此处或此处）

如何在 GCC 内联汇编中使用标签？

问题描述

3 个解决方案

解决方案1
7 已采纳 2017-03-09 03:14:44

解决方案2
3 2017-03-08 21:43:15

解决方案3
0 2017-03-09 01:32:28

如何在 GCC 内联汇编中使用标签？

问题描述

3 个解决方案

解决方案1 7 已采纳 2017-03-09 03:14:44

解决方案2 3 2017-03-08 21:43:15

解决方案3 0 2017-03-09 01:32:28

解决方案1
7 已采纳 2017-03-09 03:14:44

解决方案2
3 2017-03-08 21:43:15

解决方案3
0 2017-03-09 01:32:28