I´m having problems with assembly. Currently using VC++ 2015 x86 compiler and inline assembler to translate a method from C. The C method is running perfectly and as intended:
void calculateCRC(Info *data, int lonDiv, unsigned char divisor)
{
unsigned char a = divisor;
unsigned char b = 0;
for (int i = 0; i < data->dataLength; i++)
{
unsigned char c = 128;
while (a != 0)
{
if (data->content[i] & c)
{
data->content[i] = data->content[i] ^ a;
data->content[i + 1] = data->content[i + 1] ^ b;
}
char carry = a << 7;
b = b >> 1;
b = b | carry;
a = a >> 1;
c = c >> 1;
}
a = b;
b = 0;
}
}
Some side notes:
The Info struct only has an int declared dataLength and a pointer to a char array declared as *content.
I know that the lonDiv parameter is not being used, yet I can't change the function declarations.
With that in mind, I tried to translate as best as I could yet it does not function. The method ends up failing and the whole program crashes, not even returning an error code, it simply crashes. The code is as follows.
void calculateCRC(Info *data, int lonDiv, unsigned char divisor)
{
__asm {
push ax // a and b
push dx // content[i] and content[i+1]
push cx // c and carry
push esi // Index var
push edi // Pointer to content
push ebx // Pointer to dataLength
mov ah, [ebp + 16] //a = divisor
mov al, 0 // b = 0
mov esi, 0 //int i = 0
forLoop: // for
mov ebx, [ebp + 8]
cmp esi, [ebx] // i < data->dataLength
jz end
whileLoop:
mov ch, 128 // c = 128
cmp ah, 0 // while (a != 0)
je cutWhile
mov edi, [ebp + 8] // Info *data
add edi, 4 // data->content
mov dh, [edi + esi] // content[i]
mov dl, [edi + esi + 1] // content[i+1]
test dh, ch // if (content[i] & c)
jz next
xor dh, ah // data->content[i] ^ a;
xor dl, al // data->content[i + 1] ^ b;
mov [edi + esi], dh // data->content[i] = data->content[i] ^ a;
mov [edi + esi + 1], dl // data->content[i + 1] = data->content[i + 1] ^ b;
next:
mov cl, ah // char carry = a
shl cl, 7 // carry = carry << 7
shr al, 1 // b = b >> 1
or al, cl // b = b | carry
shr ah, 1 // a = a >> 1
shr ch, 1 // c = c >> 1
jmp whileLoop
cutWhile:
shl ax, 8 // a = b and b = 0
inc esi //i++
jmp forLoop
end:
pop ebx
pop edi
pop esi
pop cx
pop dx
pop ax
ret
}
}
I am no assembler expert, but I have tried to translate and comment out everything to no avail. Any help would be very appreciated!
(turning comments into an answer; the Godbolt links are a bit random and scattered.)
If you wantd to write a whole function yourself in asm including a ret
,use __declspec(naked)
to stop the compiler from emitting any prologue/epilogue so the entire function body is really just your asm
block. Then you need to set up ebp
as a frame pointer yourself, if you want to use it that way.
(Or not; this function needs a lot of registers so you could omit the frame pointer like optimizing compilers do. But you can save regs by optimizing away the carry
into a simple 16-bit shift, with your vars in AH, AL, CH, CL, and so on, and take advantage of 16-bit or 32-bit operand-size to do 2 8-bit operations at once..)
Using your own ret
is a bug because ESP is pointing at some regs the compiler saved, unless you're writing a naked function .
Use Visual Studio's debugger to see what's really going on, preferably with an asm / disassembly view so you can see your code from inline asm in the context of the surrounding compiler-generated code. (You'll see your ret
then the compiler-generated pop
s and compiler-generated ret
that it was expecting would run after your asm block finished with ESP unmodified)
I'd recommend not using your own push/pop or your own ret (so not a naked function), and use the C++ variable names inside your asm statement so you can watch them with the debugger by their C++ names, instead of looking at raw stack memory at all. (But using the debugger's asm / disassembly view, you can see how something like mov ebx, data
compiles to probably still the mov ebx, [ebp + 8]
like you wrote.)
As Jester points out, add edi, 4
is also a bug : that would make sense if you had struct Info { int len; uint8_t content[1020]; };
struct Info { int len; uint8_t content[1020]; };
so the content address is just a small offset from the base of the struct. But you said you have uint8_t *content;
so you need another load from the struct, like mov edi, [edi+4]
. Look at what compilers do for the pure C++ version: https://godbolt.org/z/aVdsED (or use a lower optimization level like -O1
).
You don't need to keep reloading this pointer inside the loop; that doesn't make sense. Keep the current position in content
in a register, and keep an end-pointer in another register (or memory if you run out of registers). ie spill something that's read-only and can be used as a memory operand inside the loop. So at worst your outer loop becomes cmp edi, [endp]
/ jb top_of_loop
instead of a compare between two registers.
Not a real bug (just overcomplication), but you don't need to manually push/pop registers at the start/end of your asm block (unless you write a naked function). MSVC will generate code around the function containing the asm block to save/restore any register that need saving, which means they're also free for compiler-generated use.
In fact it will do that anyway because pop edi writes EDI, and it doesn't try to prove that the inline asm matches up push and pop and doesn't modify the saved copy in memory. See https://godbolt.org/z/j5b4W- note the compiler-generated part of the compiler output.
It looks like you can simplify that manual-carry stuff into shr ax, 1
, with a:b
in AH:AL
= AX. A lot of your other instructions are doing 16-bit stuff 8 bits at a time, like loading AL and AH from 2 contiguous bytes in memory which could be mov ax, mem
.
But for the actual memory access to content[]
, compilers don't do it that way. Instead they notice that content[i+1]
in one iteration is content[i]
in the next iteration, and mov reg,reg
instead of storing and reloading. https://godbolt.org/z/aVdsED .
This is much better; overlapping store/reload is inefficient. But anyway, using ax as a:b
would be efficient, shifting 1 bit at a time until test ah,ah
finds you've shifted all the bits out.
After reading and the kind help of the community, I finally managed to get it running as I intended. The fixed code is as follows:
void calculateCRC(Info *data, int lonDiv, unsigned char divisor)
{
__asm {
mov eax, 0
mov ah, [ebp + 16] //a = divisor
mov al, 0 // b = 0
mov esi, 0 //int i = 0
forLoop: // for
mov ebx, [ebp + 8]
cmp esi, [ebx] // i < data->dataLength
jz end
mov ch, 128 // c = 128
whileLoop:
cmp ah, 0 // while (a != 0)
je cutWhile
mov edi, [ebp + 8] // Info* data
mov edi, [edi + 4] // data->content
mov dh, [edi + esi] // content[i]
mov dl, [edi + esi + 1] // content[i+1]
test dh, ch // if (content[i] & c)
jz next
xor dh, ah // data->content[i] ^ a;
xor dl, al // data->content[i + 1] ^ b;
mov [edi + esi], dh // data->content[i] = data->content[i] ^ a;
mov [edi + esi + 1], dl // data->content[i + 1] = data->content[i + 1] ^ b;
next:
mov cl, ah // char carry = a
shl cl, 7 // carry = carry << 7
shr al, 1 // b = b >> 1
or al, cl // b = b | carry
shr ah, 1 // a = a >> 1
shr ch, 1 // c = c >> 1
jmp whileLoop
cutWhile:
shl ax, 8 // a = b and b = 0
inc esi //i++
jmp forLoop
end:
}
}
Changes:
Removed pop/push
operations and let the compiler do that job.
Removed ret
operation.
Changed add edi, 4
to mov edi, [edi + 4]
.
There is still room for improvement, such as the manual carry operation pointed by Peter on his comments and answer afterward. Yet I will leave it as is for the time being. Thanks a lot to Jester and Peter Cordes for their time and comments.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.