如何用扩展的gcc程序集指定x87 FPU堆栈的破坏底部？

Question

In a codebase of ours I found this snippet for fast, towards-negative-infinity ¹ rounding on x87: 在我们的代码库中，我发现了这个代码片段，用于在x87上进行快速，向负无限¹舍入：

inline int my_int(double x)
{
  int r;
#ifdef _GCC_
  asm ("fldl %1\n"
       "fistpl %0\n"
       :"=m"(r)
       :"m"(x));
#else
  // ...
#endif
  return r;
}

I'm not extremely familiar with GCC extended assembly syntax, but from what I gather from the documentation: 我不是非常熟悉GCC扩展汇编语法，但是从我从文档中收集到的内容：

r must be a memory location, where I'm writing back stuff; r必须是一个记忆位置，我在写回东西;
x must be a memory location too, whence the data comes from. x必须也是一个内存位置，数据来自哪里。
there's no clobber specification, so the compiler can rest assured that at the end of the snippet the registers are as he left them. 没有clobber规范，因此编译器可以放心，在代码片段的末尾，寄存器就像他离开时一样。

Now, to come to my question: it's true that in the end the FPU stack is balanced, but what if all the 8 locations were already in use and I'm overflowing it? 现在，回答我的问题：最终FPU堆栈是平衡的，但是如果所有8个位置都已经在使用并且我已经溢出呢？ How can the compiler know that it cannot trust ST(7) to be where it left it? 编译器如何知道它不能信任ST(7)到它的位置？ Should some clobber be added? 应该添加一些clobber吗？

Edit I tried to specify st(7) in the clobber list and it seems to affect the codegen, now I'll wait for some confirmation of this fact. 编辑我试图在clobber列表中指定st(7) ，它似乎影响codegen，现在我将等待对此事实的一些确认。

As a side note: looking at the implementation of the barebones lrint both in glibc and in MinGW I see something like 作为旁注：在glibc和MinGW中查看准系统lrint的实现我看到类似的东西

__asm__ __volatile__ ("fistpl %0"
                      : "=m" (retval)
                      : "t" (x)
                      : "st");

where we are asking for the input to be placed directly in ST(0) (which avoids that potentially useless fldl ); 我们要求输入直接放在ST(0) （这避免了可能无用的fldl ）; what is that "st" clobber? 什么是"st" clobber？ The docs seems to mention only t (ie the top of the stack). 文档似乎只提到了t （即堆栈的顶部）。

yes, it depends from the current rounding mode, which in our application should always be "towards negative infinity". 是的，它取决于当前的舍入模式，在我们的应用程序中应该总是“朝向负无穷大”。

Answer 1

looking at the implementation of the barebones lrint both in glibc and in MinGW I see something like 看着glibc和MinGW中的准系统lrint的实现，我看到了类似的东西
 __asm__ __volatile__ ("fistpl %0" : "=m" (retval) : "t" (x) : "st"); 
where we are asking for the input to be placed directly in ST(0) (which avoids that potentially useless fldl ) 我们要求输入直接放在ST(0) （这避免了可能无用的fldl ）

This is actually the correct way to represent the code you want as inline assembly. 这实际上是将所需代码表示为内联汇编的正确方法。

To get the most optimal possible code generated, you want to make use of the inputs and outputs. 为了获得最佳的代码生成，您需要使用输入和输出。 Rather than hard-coding the necessary load/store instructions, let the compiler generate them. 不要硬编码必要的加载/存储指令，而是让编译器生成它们。 Not only does this introduce the possibility of eliding potentially unnecessary instructions, it also means that the compiler can better schedule these instructions when they are required (that is, it can interleave the instruction within a prior sequence of code, often minimizing its cost). 这不仅引入了消除可能不必要的指令的可能性，而且还意味着编译器可以在需要时更好地调度这些指令（也就是说，它可以在先前的代码序列中交织指令，通常最小化其成本）。

what is that "st" clobber? 什么是"st" clobber？ The docs seems to mention only t (ie the top of the stack). 文档似乎只提到了t （即堆栈的顶部）。

The "st" clobber refers to the st(0) register, ie , the top of the x87 FPU stack. "st" clobber是指st(0)寄存器，即 x87 FPU堆栈的顶部。 What Intel/MASM notation calls st(0) , AT&T/GAS notation generally refers to as simply st . Intel / MASM表示法称为st(0) ，AT＆T / GAS表示法通常称为st 。 And, as per GCC's documentation for clobbers , the items in the clobber list are "either register names or the special clobbers" ( "cc" (condition codes/flags) and "memory" ). 而且，根据GCC的clobbers文档，clobber列表中的项目是“注册名称或特殊符号”（ "cc" （条件代码/标志）和"memory" ）。 So this just means that the inline assembly clobbers (overwrites) the st(0) register. 所以这只是意味着内联汇编崩溃（覆盖） st(0)寄存器。 The reason why this clobber is necessary is that the fistpl instruction pops the top of the stack, thus clobbering the original contents of st(0) . 这个clobber是必要的原因是fistpl指令弹出堆栈的顶部，从而破坏了st(0)的原始内容。

The only thing that concerns me regarding this code is the following paragraph from the documentation: 关于此代码，我唯一关心的是文档中的以下段落：

Clobber descriptions may not in any way overlap with an input or output operand. Clobber描述可能不以任何方式与输入或输出操作数重叠。 For example, you may not have an operand describing a register class with one member when listing that register in the clobber list. 例如，在clobber列表中列出该寄存器时，您可能没有描述具有一个成员的寄存器类的操作数。 Variables declared to live in specific registers (see Explicit Register Variables ) and used as asm input or output operands must have no part mentioned in the clobber description. 声明存在于特定寄存器中的变量（请参阅显式寄存器变量）并用作asm输入或输出操作数必须没有在clobber描述中提及的部分。 In particular, there is no way to specify that input operands get modified without also specifying them as output operands. 特别是，没有办法指定输入操作数被修改而不将它们指定为输出操作数。

When the compiler selects which registers to use to represent input and output operands, it does not use any of the clobbered registers. 当编译器选择用于表示输入和输出操作数的寄存器时，它不使用任何被破坏的寄存器。 As a result, clobbered registers are available for any use in the assembler code. 因此，破坏寄存器可用于汇编代码中的任何用途。

As you already know, the t constraint means the top of the x87 FPU stack. 如您所知， t 约束意味着x87 FPU堆栈的顶部。 The problem is, this is the same as the st register, and the documentation very clearly said that we could not have a clobber that specifies the same register as one of the input/output operands. 问题是，这与st寄存器相同，并且文档非常明确地说我们没有一个clobber指定与输入/输出操作数之一相同的寄存器。 Furthermore, since the documentation states that the compiler is forbidden to use any of the clobbered registers to represent input/output operands, this inline assembly makes an impossible request—load this value at the top of the x87 FPU stack without putting it in st ! 此外，由于文档声明编译器禁止使用任何被破坏的寄存器来表示输入/输出操作数，因此这个内联汇编使得一个不可能的请求 - 将该值加载到x87 FPU堆栈的顶部而不将其放入st ！

Now, I would assume that the authors of glibc know what they are doing and are more familiar with the compiler's implementation of inline assembly than you or I, so this code is probably legal and legitimate. 现在，我假设glibc的作者知道他们在做什么，并且比你或我更熟悉编译器内联汇编的实现，所以这段代码可能合法且合法。

Actually, it seems that the unusual case of the x87's stack-like registers forces an exception to the normal interactions between clobbers and operands. 实际上，似乎x87的类似堆栈的寄存器的异常情况迫使clobbers和操作数之间的正常交互例外。 The official documentation says: 官方文件说：

On x86 targets, there are several rules on the usage of stack-like registers in the operands of an asm. 在x86目标上，有一些规则在asm的操作数中使用类似堆栈的寄存器。 These rules apply only to the operands that are stack-like registers: 这些规则仅适用于类似堆栈的寄存器：

Given a set of input registers that die in an asm, it is necessary to know which are implicitly popped by the asm, and which must be explicitly popped by GCC. 给定一组在asm中死亡的输入寄存器，有必要知道哪些是由asm隐式弹出的，哪些必须由GCC显式弹出。
An input register that is implicitly popped by the asm must be explicitly clobbered, unless it is constrained to match an output operand. 由asm隐式弹出的输入寄存器必须明确地被破坏，除非它被约束为匹配输出操作数。

That fits our case exactly. 这完全适合我们的情况。

Further confirmation is provided by an example appearing in the official documentation (bottom of the linked section): 官方文档中出现的示例（链接部分的底部）提供了进一步的确认：

This asm takes two inputs, which are popped by the fyl2xp1 opcode, and replaces them with one output. 这个asm接受两个输入，由fyl2xp1操作码弹出，并用一个输出替换它们。 The st(1) clobber is necessary for the compiler to know that fyl2xp1 pops both inputs. st(1) clobber是编译器知道fyl2xp1弹出两个输入所必需的。
 asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)"); 

Here, the clobber st(1) is the same as the input constraint u , which seems to violate the above-quoted documentation regarding clobbers, but is used and justified for precisely the same reason that "st" is used as the clobber in your original code, because fistpl pops the input. 这里，clobber st(1)与输入约束u相同，这似乎违反了上面提到的有关clobbers的文档，但是使用和证明了"st"被用作你的clobber的相同原因原始代码，因为fistpl弹出输入。

All of that said, and now that you know how to correctly write the code in inline assembly, I have to echo previous commenters who suggested that the best solution would be not to use inline assembly at all. 所有这些说，现在你知道如何在内联汇编中正确编写代码，我必须回应以前的评论者，他们建议最好的解决方案是不要使用内联汇编。 Just call lrint , which not only has the exact semantics that you want, but can also be better optimized by the compiler under certain circumstances ( eg , transforming it into a single cvtsd2si instruction when the target architecture supports SSE). 只需调用lrint ，它不仅具有您想要的确切语义，而且在某些情况下也可以由编译器进行更好的优化（例如，当目标体系结构支持SSE时将其转换为单个cvtsd2si指令）。

如何用扩展的gcc程序集指定x87 FPU堆栈的破坏底部？

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-05-26 09:09:50

如何用扩展的gcc程序集指定x87 FPU堆栈的破坏底部？

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-05-26 09:09:50

解决方案1
4 已采纳 2017-05-26 09:09:50