简体   繁体   English

为什么这个内联汇编不能为每条指令使用单独的asm volatile语句?

[英]Why is this inline assembly not working with a separate asm volatile statement for each instruction?

For the the following code: 对于以下代码:

long buf[64];

register long rrax asm ("rax");
register long rrbx asm ("rbx");
register long rrsi asm ("rsi");

rrax = 0x34;
rrbx = 0x39;

__asm__ __volatile__ ("movq $buf,%rsi");
__asm__ __volatile__ ("movq %rax, 0(%rsi);");
__asm__ __volatile__ ("movq %rbx, 8(%rsi);");

printf( "buf[0] = %lx, buf[1] = %lx!\n", buf[0], buf[1] );

I get the following output: 我得到以下输出:

buf[0] = 0, buf[1] = 346161cbc0!

while it should have been: 它应该是:

buf[0] = 34, buf[1] = 39!

Any ideas why it is not working properly, and how to solve it? 任何想法为什么它不能正常工作,以及如何解决它?

You clobber memory but don't tell GCC about it, so GCC can cache values in buf across assembly calls. 你破坏了内存,但没有告诉GCC,所以GCC可以在汇编调用中缓存buf值。 If you want to use inputs and outputs, tell GCC about everything. 如果您想使用输入和输出,请告诉GCC一切。

__asm__ (
    "movq %1, 0(%0)\n\t"
    "movq %2, 8(%0)"
    :                                /* Outputs (none) */
    : "r"(buf), "r"(rrax), "r"(rrbx) /* Inputs */
    : "memory");                     /* Clobbered */

You also generally want to let GCC handle most of the mov , register selection, etc -- even if you explicitly constrain the registers (rrax is stil %rax ) let the information flow through GCC or you will get unexpected results. 您通常也希望让GCC处理大部分的mov ,寄存器选择等 - 即使您明确约束寄存器(rrax是stil %rax )让信息流经GCC,否则您将获得意外结果。

__volatile__ is wrong. __volatile__错了。

The reason __volatile__ exists is so you can guarantee that the compiler places your code exactly where it is... which is a completely unnecessary guarantee for this code. __volatile__存在的原因是,您可以保证编译器将您的代码准确放置在原来的位置......这对此代码来说是完全不必要的保证。 It's necessary for implementing advanced features such as memory barriers, but almost completely worthless if you are only modifying memory and registers. 实现内存屏障等高级功能是必要的,但如果只修改内存和寄存器,几乎完全没有价值。

GCC already knows that it can't move this assembly after printf because the printf call accesses buf , and buf could be clobbered by the assembly. GCC已经知道它不能在printf之后移动这个程序集,因为printf调用访问buf ,并且buf可能被程序集破坏。 GCC already knows that it can't move the assembly before rrax=0x39; GCC已经知道它在rrax=0x39;之前无法移动程序集rrax=0x39; because rax is an input to the assembly code. 因为rax是汇编代码的输入。 So what does __volatile__ get you? 那么__volatile__会给你带来什么? Nothing. 没有。

If your code does not work without __volatile__ then there is an error in the code which should be fixed instead of just adding __volatile__ and hoping that makes everything better. 如果你的代码在没有__volatile__情况下不起作用,那么代码中的错误应该被修复,而不是仅仅添加__volatile__并希望这会使一切变得更好。 The __volatile__ keyword is not magic and should not be treated as such. __volatile__关键字不是魔术,不应该这样对待。

Alternative fix: 替代修复:

Is __volatile__ necessary for your original code? 原始代码需要__volatile__吗? No. Just mark the inputs and clobber values correctly. 不。只需正确标记输入和clobber值。

/* The "S" constraint means %rsi, "b" means %rbx, and "a" means %rax
   The inputs and clobbered values are specified.  There is no output
   so that section is blank.  */
rsi = (long) buf;
__asm__ ("movq %%rax, 0(%%rsi)" : : "a"(rrax), "S"(rssi) : "memory");
__asm__ ("movq %%rbx, 0(%%rsi)" : : "b"(rrbx), "S"(rrsi) : "memory");

Why __volatile__ doesn't help you here: 为什么__volatile__在这里没有帮助你:

rrax = 0x34; /* Dead code */

GCC is well within its rights to completely delete the above line, since the code in the question above claims that it never uses rrax . GCC完全有权完全删除上述行,因为上述问题中的代码声称它从未使用过rrax

A clearer example 一个更清晰的例子

long global;
void store_5(void)
{
    register long rax asm ("rax");
    rax = 5;
    __asm__ __volatile__ ("movq %%rax, (global)");
}

The disassembly is more or less as you expect it at -O0 , 反汇编或多或少与您期望的-O0

movl $5, %rax
movq %rax, (global)

But with optimization off, you can be fairly sloppy about assembly. 但是在优化的情况下,你可以对装配相当邋。 Let's try -O2 : 我们试试-O2

movq %rax, (global)

Whoops! 哎呦! Where did rax = 5; rax = 5; go? 走? It's dead code, since %rax is never used in the function — at least as far as GCC knows. 这是死代码,因为%rax从未在函数中使用 - 至少就GCC而言。 GCC doesn't peek inside assembly. 海湾合作委员会没有偷看内部装配。 What happens when we remove __volatile__ ? 当我们删除__volatile__时会发生什么?

; empty

Well, you might think __volatile__ is doing you a service by keeping GCC from discarding your precious assembly, but it's just masking the fact that GCC thinks your assembly isn't doing anything. 好吧,你可能会认为__volatile__通过让GCC放弃你宝贵的装配来为你服务,但它只是掩盖了GCC认为你的装配没有任何事情的事实。 GCC thinks your assembly takes no inputs, produces no outputs, and clobbers no memory. GCC认为你的程序集不需要输入,不产生输出,并且没有内存。 You had better straighten it out: 你最好把它理顺:

long global;
void store_5(void)
{
    register long rax asm ("rax");
    rax = 5;
    __asm__ __volatile__ ("movq %%rax, (global)" : : : "memory");
}

Now we get the following output: 现在我们得到以下输出:

movq %rax, (global)

Better. 更好。 But if you tell GCC about the inputs, it will make sure that %rax is properly initialized first: 但是如果你告诉GCC关于输入,它将确保%rax首先正确初始化:

long global;
void store_5(void)
{
    register long rax asm ("rax");
    rax = 5;
    __asm__ ("movq %%rax, (global)" : : "a"(rax) : "memory");
}

The output, with optimizations: 输出,优化:

movl $5, %eax
movq %rax, (global)

Correct! 正确! And we don't even need to use __volatile__ . 我们甚至不需要使用__volatile__

Why does __volatile__ exist? 为什么__volatile__存在?

The primary correct use for __volatile__ is if your assembly code does something else besides input, output, or clobbering memory. __volatile__的主要正确用法是,如果汇编代码除了输入,输出或破坏内存之外还执行其他操作。 Perhaps it messes with special registers which GCC doesn't know about, or affects IO. 也许它与GCC不了解或影响IO的特殊寄存器相混淆。 You see it a lot in the Linux kernel, but it's misused very often in user space. 你在Linux内核中看到了很多东西,但它经常在用户空间中被滥用。

The __volatile__ keyword is very tempting because we C programmers often like to think we're almost programming in assembly language already. __volatile__关键字非常诱人,因为我们C程序员经常喜欢认为我们几乎已经使用汇编语言进行编程。 We're not. 不是。 C compilers do a lot of data flow analysis — so you need to explain the data flow to the compiler for your assembly code. C编译器进行了大量的数据流分析 - 因此您需要向编译器解释汇编代码的数据流。 That way, the compiler can safely manipulate your chunk of assembly just like it manipulates the assembly that it generates. 这样,编译器可以安全地操纵你的程序集块,就像它操纵它生成的程序集一样。

If you find yourself using __volatile__ a lot, as an alternative you could write an entire function or module in an assembly file. 如果您发现自己__volatile__使用__volatile__ ,作为替代方法,您可以在汇编文件中编写整个函数或模块。

The compiler uses registers, and it may write over the values you have put in them. 编译器使用寄存器,它可以覆盖您放入它们的值。

In this case, the compiler probably uses the rbx register after the rrbx assignment and before the inline assembly section. 在这种情况下,编译器可能在rrbx赋值之后和内联汇编部分之前使用rbx寄存器。

In general, you shouldn't expect registers to keep their values after and between inline assembly code sequences. 通常,您不应期望寄存器在内联汇编代码序列之后和之间保留其值。

Slightly off-topic but I'd like to follow up a bit on gcc inline assembly. 稍微偏离主题,但我想跟进gcc内联汇编。

The (non-)need for __volatile__ comes from the fact that GCC optimizes inline assembly. (非)需要__volatile__来自GCC 优化内联汇编的事实。 GCC inspects the assembly statement for side effects / prerequisites, and if it finds them not to exist it may choose to move the assembly instruction around or even decide to remove it. GCC检查汇编语句的副作用/先决条件,如果发现它们不存在,它可能会选择移动汇编指令,甚至决定将其删除 All __volatile__ does is to tell the compiler "stop caring and put this right there". 所有__volatile__都是告诉编译器“停止关怀并把它放在那里”。

Which is usually not what you really want. 这通常不是你真正想要的。

This is where the need for constraints come in. The name is overloaded and actually used for different things in GCC inline assembly: 这就是需要约束的地方。名称被重载并实际用于GCC内联汇编中的不同内容:

  • constraints specify input / output operands used in the asm() block 约束指定asm()块中使用的输入/输出操作数
  • constraints specify the "clobber list", which details what "state" (registers, condition codes, memory) are affected by the asm() . 约束指定“clobber列表”,其详细说明asm()影响“状态”(寄存器,条件代码,内存asm()
  • constraints specify classes of operands (registers, addresses, offsets, constants, ...) 约束指定操作数的类(寄存器,地址,偏移量,常量,......)
  • constraints declare associations / bindings between assembler entities and C/C++ variables / expressions 约束声明汇编器实体和C / C ++变量/表达式之间的关联/绑定

In many cases, developers abuse __volatile__ because they noticed their code either being moved around or even disappearing without it. 在许多情况下,开发人员滥用 __volatile__因为他们注意到他们的代码要么被移动,要么在没有它的情况下消失。 If this happens, it's usually rather a sign that the developer has attempted not to tell GCC about side effects / prerequisites of the assembly. 如果发生这种情况,通常是开发人员试图告诉GCC有关装配的副作用/先决条件的信号。 For example, this buggy code: 例如,这个错误的代码:

register int foo __asm__("rax") = 1234;
register int bar __adm__("rbx") = 4321;

asm("add %rax, %rbx");
printf("I'm expecting 'bar' to be 5555 it is: %d\n", bar);

It's got several bugs: 它有几个错误:

  • for one, it only compiles due to a gcc bug (!). 首先,它只是由于gcc bug(!)而编译。 Normally, to write register names in inline assembly, double %% are needed, but in the above if you actually specify them you get a compiler/assembler error, /tmp/ccYPmr3g.s:22: Error: bad register name '%%rax' . 通常,要/tmp/ccYPmr3g.s:22: Error: bad register name '%%rax'联汇编中写入寄存器名称,需要双%% ,但在上面如果实际指定它们,则会出现编译器/汇编器错误,/ /tmp/ccYPmr3g.s:22: Error: bad register name '%%rax' / /tmp/ccYPmr3g.s:22: Error: bad register name '%%rax'
  • second, it's not telling the compiler when and where you need/use the variables. 第二,它没有告诉编译器何时何地需要/使用变量。 Instead, it assumes the compiler honours asm() literally. 相反,它假设编译器从字面上尊重asm() That might be true for Microsoft Visual C++ but is not the case for gcc. 对于Microsoft Visual C ++可能也是如此,但gcc 不是这种情况

If you compile it without optimization, it creates: 如果在没有优化的情况下编译它,它会创建:

0000000000400524 <main>:
[ ... ]
  400534:       b8 d2 04 00 00          mov    $0x4d2,%eax
  400539:       bb e1 10 00 00          mov    $0x10e1,%ebx
  40053e:       48 01 c3                add    %rax,%rbx
  400541:       48 89 da                mov    %rbx,%rdx
  400544:       b8 5c 06 40 00          mov    $0x40065c,%eax
  400549:       48 89 d6                mov    %rdx,%rsi
  40054c:       48 89 c7                mov    %rax,%rdi
  40054f:       b8 00 00 00 00          mov    $0x0,%eax
  400554:       e8 d7 fe ff ff          callq  400430 <printf@plt>
[...]
You can find your add instruction, and the initializations of the two registers, and it'll print the expected. 你可以找到你的add指令,以及两个寄存器的初始化,它将打印出预期的。 If, on the other hand, you crank optimization up, something else happens: 另一方面,如果您进行优化,则会发生其他情况:
 0000000000400530 <main>: 0000000000400530 <main>:\n  400530: 48 83 ec 08 sub $0x8,%rsp 400530:48 83 ec 08 sub $ 0x8,%rsp\n  400534: 48 01 c3 add %rax,%rbx 400534:48 01 c3添加%rax,%rbx\n  400537: be e1 10 00 00 mov $0x10e1,%esi 400537:是e1 10 00 00 mov $ 0x10e1,%esi\n  40053c: bf 3c 06 40 00 mov $0x40063c,%edi 40053c:bf 3c 06 40 00 mov $ 0x40063c,%edi\n  400541: 31 c0 xor %eax,%eax 400541:31 c0 xor%eax,%eax\n  400543: e8 e8 fe ff ff callq 400430 <printf@plt> 400543:e8 e8 fe ff ff callq 400430 <printf @ plt>\n[ ... ] [...] 
Your initializations of both the "used" registers are no longer there. 您对“已使用”寄存器的初始化不再存在。 The compiler discarded them because nothing it could see was using them, and while it kept the assembly instruction it put it before any use of the two variables. 编译器放弃了它们,因为它没有看到它们正在使用它们,并且它保留了汇编指令,它使用这两个变量之前就把它放了。 It's there but it does nothing (Luckily actually ... if rax / rbx had been in use who can tell what'd have happened ...). 它在那里,但它没有做任何事情(幸运的是......如果rax / rbx 一直在使用谁可以告诉发生了什么......)。

And the reason for that is that you haven't actually told GCC that the assembly is using these registers / these operand values. 原因是你实际上没有告诉 GCC程序集正在使用这些寄存器/这些操作数值。 This has nothing whatsoever to do with volatile but all with the fact you're using a constraint-free asm() expression. 这与volatile无关,但事实上你使用的是一个无约束的asm()表达式。

The way to do this correctly is via constraints, ie you'd use: 正确执行此操作的方法是通过约束,即您使用:

 int foo = 1234; int bar = 4321; asm("add %1, %0" : "+r"(bar) : "r"(foo)); printf("I'm expecting 'bar' to be 5555 it is: %d\\n", bar); 

This tells the compiler that the assembly: 这告诉编译器汇编:

  1. has one argument in a register, "+r"(...) that both needs to be initialized before the assembly statement, and is modified by the assembly statement, and associate the variable bar with it. 在寄存器中有一个参数, "+r"(...) ,它们都需要在汇编语句之前初始化,并由汇编语句修改,并将变量bar与它相关联。
  2. has a second argument in a register, "r"(...) that needs to be initialized before the assembly statement and is treated as readonly / not modified by the statement. 在寄存器中有第二个参数, "r"(...)需要在汇编语句之前初始化,并被声明视为readonly / not modified。 Here, associate foo with that. 在这里,将foo与此联系起来。

Notice no register assignment is specified - the compiler chooses that depending on the variables / state of the compile. 注意,没有指定寄存器赋值 - 编译器根据编译的变量/状态选择它。 The (optimized) output of the above: 上面的(优化的)输出:

 0000000000400530 <main>: 0000000000400530 <main>:\n  400530: 48 83 ec 08 sub $0x8,%rsp 400530:48 83 ec 08 sub $ 0x8,%rsp\n  400534: b8 d2 04 00 00 mov $0x4d2,%eax 400534:b8 d2 04 00 00 mov $ 0x4d2,%eax\n  400539: be e1 10 00 00 mov $0x10e1,%esi 400539:是e1 10 00 00 mov $ 0x10e1,%esi\n  40053e: bf 4c 06 40 00 mov $0x40064c,%edi 40053e:bf 4c 06 40 00 mov $ 0x40064c,%edi\n  400543: 01 c6 add %eax,%esi 400543:01 c6添加%eax,%esi\n  400545: 31 c0 xor %eax,%eax 400545:31 c0 xor%eax,%eax\n  400547: e8 e4 fe ff ff callq 400430 <printf@plt> 400547:e8 e4 fe ff ff callq 400430 <printf @ plt>\n[ ... ] [...] 
GCC inline assembly constraints are almost always necessary in some form or the other, but there can be multiple possible ways of describing the same requirements to the compiler; GCC内联汇编约束几乎总是以某种形式或其他形式存在,但是可以有多种可能的方式来描述编译器的相同要求; instead of the above, you could also write: 而不是上述,你也可以写:

0000000000400530 <main>:
  400530:       48 83 ec 08             sub    $0x8,%rsp
  400534:       bf 4c 06 40 00          mov    $0x40064c,%edi
  400539:       31 c0                   xor    %eax,%eax
  40053b:       be e1 10 00 00          mov    $0x10e1,%esi
  400540:       81 c6 d2 04 00 00       add    $0x4d2,%esi
  400546:       e8 e5 fe ff ff          callq  400430 <printf@plt>
[ ... ]

This tells gcc: 这告诉gcc:

  1. the statement has an output operand, the variable bar , that after the statement will be found in a register, "=r"(...) 该语句有一个输出操作数,即变量bar ,在语句之后将在寄存器中找到"=r"(...)
  2. the statement has an input operand, the variable foo , which is to be placed into a register, "r"(...) 该语句有一个输入操作数,即变量foo ,它将放入寄存器"r"(...)
  3. operand zero is also an input operand and to be initialized with bar 操作数零也是一个输入操作数,并用bar初始化

Or, again an alternative: 或者,再一个替代方案:

 asm("add %1, %0" : "+r"(bar) : "g"(foo)); 

which tells gcc: 告诉gcc:

  1. bla (yawn - same as before, bar both input/output) BLA(打哈欠-以前一样, bar两个输入/输出)
  2. the statement has an input operand, the variable foo , which the statement doesn't care whether it's in a register, in memory or a compile-time constant (that's the "g"(...) constraint) 该语句有一个输入操作数,即变量foo ,该语句不关心它是在寄存器中,在内存中还是在编译时常量中(即"g"(...)约束)

The result is different from the former: 结果与前者不同:

 0000000000400530 <main>: 0000000000400530 <main>:\n  400530: 48 83 ec 08 sub $0x8,%rsp 400530:48 83 ec 08 sub $ 0x8,%rsp\n  400534: bf 4c 06 40 00 mov $0x40064c,%edi 400534:bf 4c 06 40 00 mov $ 0x40064c,%edi\n  400539: 31 c0 xor %eax,%eax 400539:31 c0 xor%eax,%eax\n  40053b: be e1 10 00 00 mov $0x10e1,%esi 40053b:是e1 10 00 00 mov $ 0x10e1,%esi\n  400540: 81 c6 d2 04 00 00 add $0x4d2,%esi 400540:81 c6 d2 04 00 00添加$ 0x4d2,%esi\n  400546: e8 e5 fe ff ff callq 400430 <printf@plt> 400546:e8 e5 fe ff ff callq 400430 <printf @ plt>\n[ ... ] [...] 
because now, GCC has actually figured out foo is a compile-time constant and simply embedded the value in the add instruction ! 因为现在,GCC 实际上已经发现 foo 是一个编译时常量,只是将值嵌入到 add 指令中 Isn't that neat ? 这不是很整洁吗?

Admittedly, this is complex and takes getting used to. 不可否认,这很复杂,需要习惯。 The advantage is that letting the compiler choose which registers to use for what operands allows optimizing the code overall; 优点是让编译器选择哪些寄存器用于哪些操作数允许整体优化代码; if, for example, an inline assembly statement is used in a macro and/or a static inline function, the compiler can, depending on the calling context, choose different registers at different instantiations of the code. 例如,如果在宏和/或static inline函数中使用内联汇编语句,则编译器可以根据调用上下文在代码的不同实例中选择不同的寄存器。 Or if a certain value is compile-time evaluatable / constant in one place but not in another, the compiler can tailor the created assembly for it. 或者,如果某个值在一个地方是编译时可评估/常量而在另一个地方没有,则编译器可以为其定制创建的程序集。

Think of GCC inline assembly constraints as kind of "extended function prototypes" - they tell the compiler what types and locations for arguments / return values are, plus a bit more. 将GCC内联汇编约束视为“扩展函数原型” - 它们告诉编译器参数/返回值的类型和位置,以及更多。 If you don't specify these constraints, your inline assembly is creating the analogue of functions that operate on global variables/state only - which, as we probably all agree, are rarely ever doing exactly what you intended. 如果你没有指定这些约束,你的内联汇编就会创建仅对全局变量/状态进行操作的函数的模拟 - 正如我们可能都认为的那样,它们很少完全按照你的意图行事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM