I am using Intel Xeon Phi. I compile the program like
icpc -mmic -S xxxx.cpp
There are some syntax I don't understand in the assembly code.
vgetmantpd $0, %zmm2, %zmm9{%k3} #85.59 c79
vsubpd %zmm11, %zmm10, %zmm12{%k3} #85.59 c83
vpminsd %zmm14{aaaa}, %zmm12, %zmm13 #85.59 c87
vcvtpd2ps {rz-sae}, %zmm9, %zmm6{%k3} #85.59 c91
vpminud %zmm14{bbbb}, %zmm13, %zmm15 #85.59 c95
What does the "{"/"}" mean in %zmm12{%k3}. And what is %k3? What is %zmm14{bbbb} ?
Michael is correct in all three points:
1) the {aaaa} and {bbbb} are operand qualifiers that direct each "lane" of the input register (zmm14, in both cases) to be "swizzled" in a particular manner ("{aaaa}" means the low order element of each lane is to be replicated to all four "elements" of the lane, so if zmm14 contained, from high-order to low-order, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, 10; then zmm14{aaaa} would be 130, 130, 130, 130, 90, 90, 90, 90, 50, 50, 50, 50, 10, 10, 10, 10; and zmm14{bbbb} would be 140, 140, 140, 140, 100, 100, 100, 100, 60, 60, 60, 60, 20, 20, 20, 20. zmm14{dcba} is the default swizzle, ie the same as just saying zmm14, and it is no swizzle at all.)
2) the {k3} operand qualifier means only change those elements of the output register (zmm9, in the topmost instruction) for which the corresponding bit in the k3 mask register is set; leave all other elements in zmm9 unchanged.
3) And Michael is also totally on target that you really aren't going to be able to divine all this stuff out. You are going to need to study the architectural documents, because the Xeon Phi VPU architecture is quite a bit different than MMX and SSE. The introduction of mask registers (which are used as predicates to control which elements are modified), swizzles, broadcasts, and up- and down-conversions. In the document Michael linked, the relevant chapter for introduction to this level of the Xeon Phi architecture is chapter 7. Another document you might peruse is this one: http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-vector-microarchitecture
Not mentioned in your exact query or in Michael's response is that the {rz-sae} instruction qualifier means that that instruction should perform Rounding toward Zero, and should handle Arithmetic Exceptions Silently.
Regards, Brian R. Nickerson
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.