According to both this reference and this reference , the shr
instruction supports shifting by either 1
, by the CL
registers, and by an immediate value. However, I cannot seem to get the immediate value to work; when I have this code:
#include <stdint.h>
int main() {
uint64_t v = 15;
asm ("shr %[v], $0x04\t\n"
: [v] "+r" (v)
:
: "cc"
);
return v;
}
I get this error message:
$ gcc -masm=intel foo.c foo.c: Assembler messages: foo.c:5: Error: operand size mismatch for `shr'
How can I pass an immediate value to shr
(without loading it into CL
, which I care about because I'm optimizing for the register pressure bottleneck).
You're using guides with Intel assembly syntax. GNU assembly (GAS) uses AT&T syntax which has inverse order of operands. Changing their order seem to be doing fine:
uint64_t v = 0xffff;
asm ("shr $0x04, %[v]\n"
: [v] "+r" (v)
:
: "cc"
);
printf("%llx", v); // 0xfff
(you can also replace shr
with shrq
to make usage of 64-bit operand explicit)
If you still want to use Intel syntax as you do with -masm=intel
, you have to drop dollar sign from immediate value:
asm ("shr %[v], 4\n"
...)
If you use -masm=intel
, it activates .intel_syntax noprefix
. Immediates no longer take $
prefixes. (But for addresses, you need OFFSET symbol
). Anyway, don't use the $
.
Obviously if you just wanted a shift, you should do it with C instead of inline asm https://gcc.gnu.org/wiki/DontUseInlineAsm . (You can mask the shift count to avoid UB with shift counts that are too high, like for rotates: Best practices for circular shift (rotate) operations in C++ )
But if you want to use it as part of something that needs to be inline asm, then you can do it this way to allow the rotate count to be a variable (in cl
) or constant (immediate) from C. I used a "cJ"
constraint to allow a 0-63
immediate operand ( J
), or a register operand in rcx/ecx/cx/cl
( c
constraint). (specifically in cl
, because I cast to (uint8_t)
.
Also, I used a b
modifier to override the size, in case you wanted to use the whole rcx
as a named input for something else before you get to the shift. (See 6.45.2.8 x86 Operand Modifiers in the gcc docs ).
See also the inline-assembly tag wiki for some guides.
I used https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Multiple-assembler-dialects-in-asm-templates to let this compile and assemble correctly with AT&T or Intel syntax mode.
On the Godbolt compiler explorer , you can see this works with gcc, but clang doesn't work correctly with -masm=intel
for inline-asm. It still substitutes in %rdi
instead of rdi
and fails to assemble.
static inline uint64_t shr (uint64_t v, unsigned c)
{
// %b[c] is cl even if %[c] is ecx or whatever.
asm ("shr {%b[c],%[v] | %[v],%b[c]}"
: [v] "+r" (v)
: [c] "cJ" ((uint8_t)c)); // the cast gets this to use cl
return v;
}
uint64_t shr_variable(uint64_t v, int c) {
return shr(v, c);
}
mov rax, rdi
mov ecx, esi
shr rax,cl
ret
uint64_t shr_const(uint64_t v) {
return shr(v, 13);
}
mov rax, rdi
shr rax,13
ret
Compare this with pure C, with -march=haswell
:
// can use SHRX with BMI2 available. And can optimize much better
uint64_t shr_variable_purec(uint64_t v, unsigned c) {
//c &= 63; // optional, compiles to zero instructions on x86 because shr and shrx already do this.
return v >> c;
}
shrx rax, rdi, rsi
ret
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.