简体   繁体   中英

How can I pass an immediate value to shr in assembly in Intel syntax?

According to both this reference and this reference , the shr instruction supports shifting by either 1 , by the CL registers, and by an immediate value. However, I cannot seem to get the immediate value to work; when I have this code:

#include <stdint.h>

int main() {
  uint64_t v = 15;
  asm ("shr %[v], $0x04\t\n"
       : [v] "+r" (v)
       :
       : "cc"
       );
  return v;
}

I get this error message:

$ gcc -masm=intel foo.c
foo.c: Assembler messages:
foo.c:5: Error: operand size mismatch for `shr'

How can I pass an immediate value to shr (without loading it into CL , which I care about because I'm optimizing for the register pressure bottleneck).

You're using guides with Intel assembly syntax. GNU assembly (GAS) uses AT&T syntax which has inverse order of operands. Changing their order seem to be doing fine:

uint64_t v = 0xffff;
asm ("shr $0x04, %[v]\n"
   : [v] "+r" (v)
   :
   : "cc"
   );
printf("%llx", v);        // 0xfff

(you can also replace shr with shrq to make usage of 64-bit operand explicit)

If you still want to use Intel syntax as you do with -masm=intel , you have to drop dollar sign from immediate value:

asm ("shr %[v], 4\n"
     ...)

If you use -masm=intel , it activates .intel_syntax noprefix . Immediates no longer take $ prefixes. (But for addresses, you need OFFSET symbol ). Anyway, don't use the $ .

Obviously if you just wanted a shift, you should do it with C instead of inline asm https://gcc.gnu.org/wiki/DontUseInlineAsm . (You can mask the shift count to avoid UB with shift counts that are too high, like for rotates: Best practices for circular shift (rotate) operations in C++ )


But if you want to use it as part of something that needs to be inline asm, then you can do it this way to allow the rotate count to be a variable (in cl ) or constant (immediate) from C. I used a "cJ" constraint to allow a 0-63 immediate operand ( J ), or a register operand in rcx/ecx/cx/cl ( c constraint). (specifically in cl , because I cast to (uint8_t) .

Also, I used a b modifier to override the size, in case you wanted to use the whole rcx as a named input for something else before you get to the shift. (See 6.45.2.8 x86 Operand Modifiers in the gcc docs ).

See also the tag wiki for some guides.

I used https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Multiple-assembler-dialects-in-asm-templates to let this compile and assemble correctly with AT&T or Intel syntax mode.

On the Godbolt compiler explorer , you can see this works with gcc, but clang doesn't work correctly with -masm=intel for inline-asm. It still substitutes in %rdi instead of rdi and fails to assemble.

static inline uint64_t shr (uint64_t v, unsigned c)
{
    // %b[c] is cl even if %[c] is ecx or whatever.
    asm ("shr  {%b[c],%[v] | %[v],%b[c]}"
         : [v] "+r" (v) 
         : [c] "cJ" ((uint8_t)c));  // the cast gets this to use cl
    return v;
}

uint64_t shr_variable(uint64_t v, int c) {
    return shr(v, c);
}

    mov     rax, rdi
    mov     ecx, esi
    shr   rax,cl
    ret


uint64_t shr_const(uint64_t v) {
    return shr(v, 13);
}

    mov     rax, rdi
    shr   rax,13
    ret

Compare this with pure C, with -march=haswell :

// can use SHRX with BMI2 available.  And can optimize much better
uint64_t shr_variable_purec(uint64_t v, unsigned c) {
    //c &= 63;  // optional, compiles to zero instructions on x86 because shr and shrx already do this.
    return v >> c;
}

    shrx    rax, rdi, rsi
    ret

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM