In line assembler method crashing the entire program

Question

I´m having problems with assembly. Currently using VC++ 2015 x86 compiler and inline assembler to translate a method from C. The C method is running perfectly and as intended:

void calculateCRC(Info *data, int lonDiv, unsigned char divisor)
{
    unsigned char a = divisor;
    unsigned char b = 0;

    for (int i = 0; i < data->dataLength; i++)
    {
        unsigned char c = 128;
        while (a != 0)
        {
            if (data->content[i] & c)
            {
                data->content[i] = data->content[i] ^ a;
                data->content[i + 1] = data->content[i + 1] ^ b;
            }
            char carry = a << 7;
            b = b >> 1;
            b = b | carry;
            a = a >> 1;
            c = c >> 1;
        }
        a = b;
        b = 0;
    }
}

Some side notes:

The Info struct only has an int declared dataLength and a pointer to a char array declared as *content.
I know that the lonDiv parameter is not being used, yet I can't change the function declarations.

With that in mind, I tried to translate as best as I could yet it does not function. The method ends up failing and the whole program crashes, not even returning an error code, it simply crashes. The code is as follows.

void calculateCRC(Info *data, int lonDiv, unsigned char divisor)
{
    __asm {
        push ax // a and b
        push dx // content[i] and content[i+1]
        push cx // c and carry
        push esi // Index var
        push edi // Pointer to content
        push ebx // Pointer to dataLength

        mov ah, [ebp + 16] //a = divisor
        mov al, 0 // b = 0
        mov esi, 0 //int i = 0

        forLoop: // for
            mov ebx, [ebp + 8]
            cmp esi, [ebx] // i < data->dataLength

            jz end
            whileLoop:
                mov ch, 128 // c = 128
                cmp ah, 0 // while (a != 0)
                je cutWhile
            mov edi, [ebp + 8] // Info *data
            add edi, 4 // data->content
            mov dh, [edi + esi] // content[i]
            mov dl, [edi + esi + 1] // content[i+1]
            test dh, ch // if (content[i] & c)
            jz next
                xor dh, ah // data->content[i] ^ a;
                xor dl, al // data->content[i + 1] ^ b;
                mov [edi + esi], dh // data->content[i] = data->content[i] ^ a;
                mov [edi + esi + 1], dl // data->content[i + 1] = data->content[i + 1] ^ b;
                next:
                    mov cl, ah // char carry = a
                    shl cl, 7 // carry = carry << 7
                    shr al, 1 // b = b >> 1
                    or al, cl // b = b | carry
                    shr ah, 1 // a = a >> 1
                    shr ch, 1 // c = c >> 1
                    jmp whileLoop
            cutWhile:
                shl ax, 8 // a = b and b = 0
                inc esi //i++
                jmp forLoop
        end:
            pop ebx
            pop edi
            pop esi
            pop cx
            pop dx
            pop ax
            ret
    }
}

I am no assembler expert, but I have tried to translate and comment out everything to no avail. Any help would be very appreciated!

Answer 1

(turning comments into an answer; the Godbolt links are a bit random and scattered.)

If you wantd to write a whole function yourself in asm including a ret ,use __declspec(naked) to stop the compiler from emitting any prologue/epilogue so the entire function body is really just your asm block. Then you need to set up ebp as a frame pointer yourself, if you want to use it that way.

(Or not; this function needs a lot of registers so you could omit the frame pointer like optimizing compilers do. But you can save regs by optimizing away the carry into a simple 16-bit shift, with your vars in AH, AL, CH, CL, and so on, and take advantage of 16-bit or 32-bit operand-size to do 2 8-bit operations at once..)

Using your own ret is a bug because ESP is pointing at some regs the compiler saved, unless you're writing a naked function .

Use Visual Studio's debugger to see what's really going on, preferably with an asm / disassembly view so you can see your code from inline asm in the context of the surrounding compiler-generated code. (You'll see your ret then the compiler-generated pop s and compiler-generated ret that it was expecting would run after your asm block finished with ESP unmodified)

I'd recommend not using your own push/pop or your own ret (so not a naked function), and use the C++ variable names inside your asm statement so you can watch them with the debugger by their C++ names, instead of looking at raw stack memory at all. (But using the debugger's asm / disassembly view, you can see how something like mov ebx, data compiles to probably still the mov ebx, [ebp + 8] like you wrote.)

As Jester points out, add edi, 4 is also a bug : that would make sense if you had struct Info { int len; uint8_t content[1020]; }; struct Info { int len; uint8_t content[1020]; }; so the content address is just a small offset from the base of the struct. But you said you have uint8_t *content; so you need another load from the struct, like mov edi, [edi+4] . Look at what compilers do for the pure C++ version: https://godbolt.org/z/aVdsED (or use a lower optimization level like -O1 ).

You don't need to keep reloading this pointer inside the loop; that doesn't make sense. Keep the current position in content in a register, and keep an end-pointer in another register (or memory if you run out of registers). ie spill something that's read-only and can be used as a memory operand inside the loop. So at worst your outer loop becomes cmp edi, [endp] / jb top_of_loop instead of a compare between two registers.

Not a real bug (just overcomplication), but you don't need to manually push/pop registers at the start/end of your asm block (unless you write a naked function). MSVC will generate code around the function containing the asm block to save/restore any register that need saving, which means they're also free for compiler-generated use.

In fact it will do that anyway because pop edi writes EDI, and it doesn't try to prove that the inline asm matches up push and pop and doesn't modify the saved copy in memory. See https://godbolt.org/z/j5b4W- note the compiler-generated part of the compiler output.

It looks like you can simplify that manual-carry stuff into shr ax, 1 , with a:b in AH:AL = AX. A lot of your other instructions are doing 16-bit stuff 8 bits at a time, like loading AL and AH from 2 contiguous bytes in memory which could be mov ax, mem .

But for the actual memory access to content[] , compilers don't do it that way. Instead they notice that content[i+1] in one iteration is content[i] in the next iteration, and mov reg,reg instead of storing and reloading. https://godbolt.org/z/aVdsED .

This is much better; overlapping store/reload is inefficient. But anyway, using ax as a:b would be efficient, shifting 1 bit at a time until test ah,ah finds you've shifted all the bits out.

Answer 2

After reading and the kind help of the community, I finally managed to get it running as I intended. The fixed code is as follows:

void calculateCRC(Info *data, int lonDiv, unsigned char divisor)
{   
    __asm {
        mov eax, 0
        mov ah, [ebp + 16] //a = divisor
        mov al, 0 // b = 0
        mov esi, 0 //int i = 0

        forLoop: // for
            mov ebx, [ebp + 8]
            cmp esi, [ebx] // i < data->dataLength

            jz end
            mov ch, 128 // c = 128
            whileLoop:
                cmp ah, 0 // while (a != 0)
                je cutWhile
            mov edi, [ebp + 8] // Info* data
            mov edi, [edi + 4] // data->content
            mov dh, [edi + esi] // content[i]
            mov dl, [edi + esi + 1] // content[i+1]
            test dh, ch // if (content[i] & c)
            jz next
                xor dh, ah // data->content[i] ^ a;
                xor dl, al // data->content[i + 1] ^ b;
                mov [edi + esi], dh // data->content[i] = data->content[i] ^ a;
                mov [edi + esi + 1], dl // data->content[i + 1] = data->content[i + 1] ^ b;
                next:
                    mov cl, ah // char carry = a
                    shl cl, 7 // carry = carry << 7
                    shr al, 1 // b = b >> 1
                    or al, cl // b = b | carry
                    shr ah, 1 // a = a >> 1
                    shr ch, 1 // c = c >> 1
                    jmp whileLoop
            cutWhile:
                shl ax, 8 // a = b and b = 0
                inc esi //i++
                jmp forLoop
        end:
    }
}

Changes:

Removed pop/push operations and let the compiler do that job.
Removed ret operation.
Changed add edi, 4 to mov edi, [edi + 4] .

There is still room for improvement, such as the manual carry operation pointed by Peter on his comments and answer afterward. Yet I will leave it as is for the time being. Thanks a lot to Jester and Peter Cordes for their time and comments.

In line assembler method crashing the entire program

Question

2 answers

solution1
3 2019-11-27 03:33:04

solution2
2 2019-11-27 18:14:07

In line assembler method crashing the entire program

Question

2 answers

solution1 3 2019-11-27 03:33:04

solution2 2 2019-11-27 18:14:07

solution1
3 2019-11-27 03:33:04

solution2
2 2019-11-27 18:14:07