简体   繁体   English

大括号“{”在 AT&T 汇编中是什么意思

[英]What does brace "{" mean in AT&T assembly

I am using Intel Xeon Phi.我正在使用英特尔至强融核。 I compile the program like我编译程序

icpc -mmic -S xxxx.cpp

There are some syntax I don't understand in the assembly code.汇编代码中有一些我不明白的语法。

     vgetmantpd $0, %zmm2, %zmm9{%k3}                        #85.59 c79
     vsubpd    %zmm11, %zmm10, %zmm12{%k3}                   #85.59 c83
     vpminsd   %zmm14{aaaa}, %zmm12, %zmm13                  #85.59 c87
     vcvtpd2ps {rz-sae}, %zmm9, %zmm6{%k3}                   #85.59 c91
     vpminud   %zmm14{bbbb}, %zmm13, %zmm15                  #85.59 c95

What does the "{"/"}" mean in %zmm12{%k3}. %zmm12{%k3} 中的 "{"/"}" 是什么意思。 And what is %k3?什么是 %k3? What is %zmm14{bbbb} ?什么是 %zmm14{bbbb} ?

Michael is correct in all three points:迈克尔在所有三点上都是正确的:

1) the {aaaa} and {bbbb} are operand qualifiers that direct each "lane" of the input register (zmm14, in both cases) to be "swizzled" in a particular manner ("{aaaa}" means the low order element of each lane is to be replicated to all four "elements" of the lane, so if zmm14 contained, from high-order to low-order, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, 10; then zmm14{aaaa} would be 130, 130, 130, 130, 90, 90, 90, 90, 50, 50, 50, 50, 10, 10, 10, 10; and zmm14{bbbb} would be 140, 140, 140, 140, 100, 100, 100, 100, 60, 60, 60, 60, 20, 20, 20, 20. zmm14{dcba} is the default swizzle, ie the same as just saying zmm14, and it is no swizzle at all.) 1) {aaaa} 和 {bbbb} 是操作数限定符,它们指示输入寄存器的每个“通道”(zmm14,在这两种情况下)以特定方式“混合”(“{aaaa}”表示低阶元素每个车道的所有四个“元素”都要复制到车道的所有四个“元素”中,所以如果包含zmm14,从高阶到低阶,160, 150, 140, 130, 120, 110, 100, 90, 80, 70 , 60, 50, 40, 30, 20, 10; 那么 zmm14{aaaa} 将是 130, 130, 130, 130, 90, 90, 90, 90, 50, 50, 50, 50, 10, 10, 1 10; 和 zmm14{bbbb} 将是 140, 140, 140, 140, 100, 100, 100, 100, 60, 60, 60, 60, 20, 20, 20, 20. zmm14{dcizzle, 是默认值即与仅说 zmm14 相同,并且根本没有混淆。)

2) the {k3} operand qualifier means only change those elements of the output register (zmm9, in the topmost instruction) for which the corresponding bit in the k3 mask register is set; 2) {k3} 操作数限定符意味着只更改输出寄存器(zmm9,在最顶层指令中)的那些元素,这些元素为其设置了 k3 掩码寄存器中的相应位; leave all other elements in zmm9 unchanged.保持 zmm9 中的所有其他元素不变。

3) And Michael is also totally on target that you really aren't going to be able to divine all this stuff out. 3)而且迈克尔也完全有目标,你真的无法预测所有这些东西。 You are going to need to study the architectural documents, because the Xeon Phi VPU architecture is quite a bit different than MMX and SSE.您将需要研究架构文档,因为至强融核 VPU 架构与 MMX 和 SSE 有很大不同。 The introduction of mask registers (which are used as predicates to control which elements are modified), swizzles, broadcasts, and up- and down-conversions.引入掩码寄存器(用作谓词来控制修改哪些元素)、swizzles、广播以及上下转换。 In the document Michael linked, the relevant chapter for introduction to this level of the Xeon Phi architecture is chapter 7. Another document you might peruse is this one: http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-vector-microarchitecture在 Michael 链接的文档中,介绍此级别至强融核架构的相关章节是第 7 章。您可能会仔细阅读的另一个文档是: http : //software.intel.com/en-us/articles/intel-至强融核协处理器向量微架构

Not mentioned in your exact query or in Michael's response is that the {rz-sae} instruction qualifier means that that instruction should perform Rounding toward Zero, and should handle Arithmetic Exceptions Silently.在您的确切查询或迈克尔的回复中没有提到 {rz-sae} 指令限定符意味着该指令应该执行向零舍入,并且应该静默处理算术异常。

Regards, Brian R. Nickerson问候,布赖恩·R·尼克森

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM