GCC inline assembly read value from array

Question

While learning gcc inline assembly I was playing a bit with memory access. I'm trying to read a value from an array using a value from a different array as index. Both arrays are initialized to something.

Initialization:

uint8_t* index = (uint8_t*)malloc(256);
memset(index, 33, 256);

uint8_t* data = (uint8_t*)malloc(256);
memset(data, 44, 256);

Array access:

unsigned char read(void *index,void *data) {
        unsigned char value;

        asm __volatile__ (
        "  movzb (%1), %%edx\n"
        "  movzb (%2, %%edx), %%eax\n"
        : "=r" (value)
        : "c" (index), "c" (data)
        : "%eax", "%edx");

        return value;
    }

This is how I use the function:

unsigned char value = read(index, data);

Now I would expect it to return 44. But it actually returns me some random value. Am I reading from uninitialzed memory? Also I'm not sure how to tell the compiler that it should assign the value from eax to the variable value .

Answer 1

You told the compiler you were going to put the output in %0 , and it could pick any register for that "=r" . But instead you never write %0 in your template.

And you use two temporaries for no apparent reason when you could have used %0 as the temporary.

As usual, you can debug your inline asm by adding comments like # 0 = %0 and looking at the compiler's asm output. (Not disassembly, just gcc -S to see what it fills in. eg # 0 = %ecx . (You didn't use an early-clobber "=&r" so it can pick the same register as inputs).

Also, this has 2 other bugs:

doesn't compile. Requesting 2 different operands in ECX with "c" constraints can't work unless the compiler can prove at compile-time that they have the same value so %1 and %2 can be the same register. https://godbolt.org/z/LgR4xS
You dereference pointer inputs without telling the compiler you're reading the pointed-to memory. Use a "memory" clobber or dummy memory operands. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?

Or better https://gcc.gnu.org/wiki/DontUseInlineAsm because it's useless for this; just let GCC emit the movzb loads itself. unsigned char* is safe from strict-aliasing UB so you can safely cast any pointer to unsigned char* and dereference it, without even having to use memcpy or other hacks to fight against language rules for wider unaligned or type-punned accesses.

But if you insist on inline asm, read manuals and tutorials, links at https://stackoverflow.com/tags/inline-assembly/info . You can't just throw code at the wall until it sticks with inline asm: you must understand why your code is safe to have any hope of it being safe. There are many ways for inline asm to happen to work but actually be broken, or be waiting to break with different surrounding code.

This is a safe and not totally terrible version (other than the unavoidable optimization-defeating parts of inline asm). You do still want a movzbl load for both loads, even though the return value is only 8 bits. movzbl is the natural efficient way to load a byte, replacing instead of merging with the old contents of a full register.

unsigned char read(void *index, void *data)
{
    uintptr_t value;
    asm (
        " movzb (%[idx]), %k[out] \n\t"
        " movzb (%[arr], %[out]), %k[out]\n"
        : [out] "=&r" (value)              // early-clobber output
        : [idx] "r" (index), [arr] "r" (data)
        : "memory"  // we deref some inputs as pointers
    );

    return value;
}

Note the early-clobber on the output: this stops gcc from picking the same register for output as one of the inputs. It would be safe for it to destroy the [idx] register with the first load, but I don't know how to tell GCC that in one asm statement. You could split your asm statement into two separate ones, each with their own input and output operands, connecting the output of the first to the input of the 2nd via a local variable. Then neither one would need early-clobber because they're just wrapping single instructions like GNU C inline asm syntax is designed to do nicely.

Godbolt with test caller to see how it inlines / optimizes when called twice, with i386 clang and x86-64 gcc. eg asking for index in a register forces an LEA, instead of letting the compiler see the deref and letting it pick an addressing mode for *index . Also the extra movzbl %al, %eax done by the compiler when adding to unsigned sum because we used a narrow return type.

I used uintptr_t value so this can compile for 32-bit and 64-bit x86. There's no harm in making the output from the asm statement wider than the return value of the function, and that saves us from having to use size modifiers like movzbl (%1), %k0 to get GCC to print the 32-bit register name (like EAX) if it chose AL for an 8-bit output variable, for example.

I did decided to actually use %k[out] for the benefit of 64-bit mode: we want movzbl (%rdi), %eax , not movzb (%rdi), %rax (wasting a REX prefix).

You might as well declare the function to return unsigned int or uintptr_t , though, so the compiler knows that it doesn't have to redo zero-extension . OTOH sometimes it can help the compiler to know that the value-range is only 0..255. You could tell it that you produce a correctly-zero-extend value using if(retval>255) __builtin_unreachable() or something. Or you could just not use inline asm .

You don't need asm volatile . (Assuming you want to let it optimize away if the result is unused, or be hoisted out of loops for constant inputs). You only need a "memory" clobber so if it does get used, the compiler knows that it reads memory.

(A "memory" clobber counts as all memory being an input, and all memory being an output. So it can't CSE, eg hoist out of a loop, because as far as the compiler knows one invocation might read something a previous one wrote. So in practice a "memory" clobber is about as bad as asm volatile . Even two back-to-back calls to this function without touching the input array force the compiler to emit the instructions twice.)

You could avoid this with dummy memory-input operands so the compiler knows this asm block doesn't modify memory, only read it. But if you actually care about efficiency, you shouldn't be using inline asm for this.

But like I said there is zero reason to use inline asm:

This will do exactly the same thing in 100% portable and safe ISO C:

// safe from strict-aliasing violations 
// because  unsigned char* can alias anything
inline
unsigned char read(void *index, void *data) {
    unsigned idx = *(unsigned char*)index;
    unsigned char * dp = data;
    return dp[idx];
}

You could cast one or both pointers to volatile unsigned char* if you insist on the access happening every time and not being optimized away.

Or maybe even to atomic<unsigned char> * depending on what you're doing. (That's a hack, prefer C++20 atomic_ref to atomically load/store on objects that are normally not atomic.)

Answer 2

After reading the manuals a bit more carefully I came up with essentially 2 solutions:

1)

unsigned char forward(void *index, void *data) {
    unsigned char value;

    asm (
    "  mov %1, %%eax            \n"
    "  movzb (%%eax), %%eax     \n"
    "  mov %2, %%edx            \n"
    "  mov (%%edx, %%eax), %0   \n"
    : "=r" (value)
    : "m" (index), "m" (data)
    : "%edx");

    return value;
}

Here I still have to tell gcc what registers to use.

2)

If I would like to let gcc decide, I need to change the expression for the output operand to be of a 32 bit type. Otherwise the base+index addressing is not correct.

unsigned char forward2(void *index, void *data) {
    size_t value;

    asm (
    "  mov %1, %0           \n"
    "  movzb (%0), %0       \n"
    "  mov %2, %%edx        \n"
    "  mov (%%edx, %0), %0  \n"
    : "=r" (value)
    : "m" (index), "m" (data)
    : "%edx");

    return (unsigned char)value;
}

GCC inline assembly read value from array

Question

1 answers

solution1
3 ACCPTED 2019-10-24 13:35:11

But like I said there is zero reason to use inline asm:

solution2
-1 2019-10-25 08:57:21

GCC inline assembly read value from array

Question

1 answers

solution1 3 ACCPTED 2019-10-24 13:35:11

But like I said there is zero reason to use inline asm:

solution2 -1 2019-10-25 08:57:21

solution1
3 ACCPTED 2019-10-24 13:35:11

solution2
-1 2019-10-25 08:57:21