如何确保对操作数的读/写在扩展ASM的期望时间发生？

Question

According to GCC's Extended ASM and Assembler Template , to keep instructions consecutive, they must be in the same ASM block. 根据GCC的扩展ASM和汇编程序模板，要保持指令连续，它们必须位于同一ASM块中。 I'm having trouble understanding what provides the scheduling or timings of reads and writes to the operands in a block with multiple statements. 我很难理解是什么提供了对具有多个语句的块中的操作数进行读写的调度或计时。

As an example, EBX or RBX needs to be preserved when using CPUID because, according to the ABI, the caller owns it. 例如，使用CPUID时需要保留EBX或RBX ，因为根据ABI，调用者拥有它。 There are some open questions with respect to the use of EBX and RBX , so we want to preserve it unconditionally (its a requirement). 关于EBX和RBX的使用存在一些未解决的问题，因此我们希望无条件地保留它（这是一个要求）。 So three instructions need to be encoded into a single ASM block to ensure the consecutive-ness of the instructions (re: the assembler template discussed in the first paragraph): 因此，需要将三个指令编码到单个ASM块中，以确保指令的连续性（例如：第一段中讨论的汇编模板）：

unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;

__asm__ __volatile__ (

  "push %ebx;"
  "cpuid;"
  "pop %ebx"
  : "=a"(__EAX), "=b"(__EBX), "=c"(__ECX), "=d"(__EDX)
  : "a"(__FUNC), "c"(__SUBFUNC)

);

If the expression representing the operands is interpreted at the wrong point in time, then __EBX will be the saved EBX (and not the CPUID 's EBX ), which will likely be a pointer to the Global Offset Table (GOT) if PIC is enabled. 如果在错误的时间点解释了表示操作数的表达式，则__EBX将是保存的EBX （而不是CPUID的EBX ），如果启用了PIC，则它很可能是指向全局偏移表（GOT）的指针。

Where, exactly, does the expression specify that the store of CPUID 's %EBX into __EBX should happen (1) after the PUSH %EBX ; 该表达式确切地在何处指定将CPUID的%EBX存储到__EBX （1）在PUSH %EBX ； (2) after the CPUID ; （2）在CPUID ； but (3) before the POP %EBX ? 但是（3）在POP %EBX之前？

Answer 1

In your question you present some code that does a push and pop of ebx . 在您的问题中，您将提供一些执行ebx push和pop的代码。 The idea of saving ebx in the event that you compile with gcc using -fPIC (position independent code) is correct. 在使用-fPIC （位置无关代码）使用gcc进行编译时，保存ebx的想法是正确的。 It is up to our function not to clobber ebx upon return in that situation. 在这种情况下返回时，不要破坏ebx是我们的职责。 Unfortunately the way you have defined the constraints you explicitly use ebx . 不幸的是，您使用ebx明确定义约束的方式。 Generally the compiler will warn you ( error: inconsistent operand constraints in an 'asm' ) if you are using PIC code and you specify =b as an output constraint. 通常，如果您使用的是PIC代码并且将=b指定为输出约束，则编译器会警告您（ 错误：'asm'中的操作数约束不一致 ）。 Why it doesn't produce a warning for you is unusual. 为什么它不会为您发出警告，这很不寻常。

To get around this problem you can let the assembler template choose a register for you. 要解决此问题，您可以让汇编器模板为您选择一个寄存器。 Instead of pushing and popping we simply exchange %ebx with an unused register chosen by the compiler and restore it by exchanging it back after. 无需推送和弹出，我们只需将%ebx与编译器选择的未使用寄存器交换，然后通过将其交换回来来恢复它。 Since we don't wish to have the compiler clobber our input registers during the exchange we specify early clobber modifier, thus ending up with a constraint of =&r (instead of =b in the OPs code). 由于我们不希望在交换过程中让编译器破坏我们的输入寄存器，因此我们指定了早期的clobber修饰符，因此最终以=&r （而不是OPs代码中的=b ）为约束。 More on modifiers can be found here . 在这里可以找到更多关于修饰符的信息。 Your code (for 32 bit) would look something like: 您的代码（32位）如下所示：

unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;

__asm__ __volatile__ (
       "xchgl\t%%ebx, %k1\n\t"      \
       "cpuid\n\t"                  \
       "xchgl\t%%ebx, %k1\n\t"

  : "=a"(__EAX), "=&r"(__EBX), "=c"(__ECX), "=d"(__EDX)
  : "a"(__FUNC), "c"(__SUBFUNC));

If you intend to compile for X86_64 (64 bit) you'll need to save the entire contents of %rbx . 如果打算针对X86_64（64位）进行编译，则需要保存%rbx的全部内容。 The code above will not quite work. 上面的代码将无法正常工作。 You'd have to use something like: 您将必须使用类似：

uint32_t  __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */

__asm__ __volatile__ (
       "xchgq\t%%rbx, %q1\n\t"      \
       "cpuid\n\t"                  \
       "xchgq\t%%rbx, %q1\n\t"

  : "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
  : "a"(__FUNC), "c"(__SUBFUNC));

You could code this up using conditional compilation to deal with both X86_64 and i386: 您可以使用条件编译来处理X86_64和i386：

uint32_t  __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */

#if defined(__i386__)
    __asm__ __volatile__ (
           "xchgl\t%%ebx, %k1\n\t"      \
           "cpuid\n\t"                  \
           "xchgl\t%%ebx, %k1\n\t"

      : "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
      : "a"(__FUNC), "c"(__SUBFUNC));

#elif defined(__x86_64__)
    __asm__ __volatile__ (
           "xchgq\t%%rbx, %q1\n\t"      \
           "cpuid\n\t"                  \
           "xchgq\t%%rbx, %q1\n\t"

      : "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
      : "a"(__FUNC), "c"(__SUBFUNC));
#else
#error "Unknown architecture."
#endif

GCC has a __cpuid macro defined in cpuid.h . GCC有__cpuid中定义的宏cpuid.h 。 It defined the macro so that it only saves the ebx and rbx register when required. 它定义了宏，以便仅在需要时保存ebx和rbx寄存器。 You can find the GCC 4.8.1 macro definition here to get an idea of how they handle cpuid in cpuid.h . 您可以在此处找到GCC 4.8.1宏定义，以了解它们如何处理cpuid.h中的 cpuid 。

The astute reader may ask the question - what stops the compiler from choosing ebx or rbx as the scratch register to use for the exchange. 精明的读者可能会问这个问题-是什么阻止了编译器选择ebx或rbx作为交换的暂存器。 The compiler knows about ebx and rbx in the context of PIC, and will not allow it to be used as a scratch register. 编译器在PIC上下文中了解ebx和rbx ，因此不会将其用作暂存寄存器。 This is based on my personal observations over the years and reviewing the assembler (.s) files generated from C code. 这是基于我多年来的个人观察并回顾了从C代码生成的汇编器（.s）文件。 I can't say for certain how more ancient versions of gcc handled it so it could be a problem. 我不能肯定地说更古老的gcc版本如何处理它，所以可能是一个问题。

Answer 2

I think you understand, but to be clear, the "consecutive" rule means that this: 我认为您了解但明确地说，“连续”规则意味着：

asm ("a");
asm ("b");
asm ("c");

... might get other instructions interposed, so if that's not desirable then it must be rewritten like this: ...可能会插入其他指令，因此，如果不希望这样做，则必须像这样重写：

asm ("a\n"
     "b\n"
     "c");

... and now it will be inserted as a whole. ...现在将其作为一个整体插入。

As for the cpuid snippet, we have two problems: 至于cpuid代码段，我们有两个问题：

The cpuid instruction will overwrite ebx , and hence clobber the data that PIC code must keep there. cpuid指令将覆盖ebx ，从而破坏了PIC代码必须保留在其中的数据。
We want to extract the value that cpuid places in ebx while never returning to compiled code with the "wrong" ebx value. 我们要提取cpuid放在ebx的值，而永远不要返回带有“错误” ebx值的编译代码。

One possible solution would be this: 一种可能的解决方案是：

unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;

__asm__ __volatile__ (    
  "push %ebx;"
  "cpuid;"
  "mov %ebx, %ecx"
  "pop %ebx"
  : "=c"(__EBX)
  : "a"(__FUNC), "c"(__SUBFUNC)
  : "eax", "edx"
);
__asm__ __volatile__ (    
  "push %ebx;"
  "cpuid;"
  "pop %ebx"
  : "=a"(__EAX), "=c"(__ECX), "=d"(__EDX)
  : "a"(__FUNC), "c"(__SUBFUNC)
);

There's no need to mark ebx as clobbered as you're putting it back how you found it. 无需将ebx标记为已破坏，而是将其放回原来的状态。

(I don't do much Intel programming, so I may have some of the assembler-specific details off there, but this is how asm works.) （我没有做太多的Intel编程，所以我可能有一些特定于汇编器的详细信息，但这是asm工作方式。）

如何确保对操作数的读/写在扩展ASM的期望时间发生？

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-08-19 23:29:47

解决方案2
2 2015-08-19 20:31:29

如何确保对操作数的读/写在扩展ASM的期望时间发生？

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-08-19 23:29:47

解决方案2 2 2015-08-19 20:31:29

解决方案1
3 已采纳 2015-08-19 23:29:47

解决方案2
2 2015-08-19 20:31:29