Translating O2 optimized for-loop from assembly to C

Question

This is a homework question. I am attempting to obtain information from the following assembly code (x86 linux machine, compiled with gcc -O2 optimization). I have commented each section to show what I know. A big chunk of my assumptions could be wrong, but I have done enough searching to the point where I know I should ask these questions here.

.section        .rodata.str1.1,"aMS",@progbits,1

.LC0:
    .string "result %lx\n"  //Printed string at end of program
    .text

main:
.LFB13: 
    xorl    %esi, %esi         // value of esi = 0; x
    movl    $1, %ecx           // value of ecx = 1; result
    xorl    %edx, %edx         // value of edx = 0; Loop increment variable (possibly mask?)
.L2:
    movq    %rcx, %rax         // value of rax = 1; ?
    addl    $1, %edx           // value of edx = 1; Increment loop by one;
    salq    $3, %rcx           // value of rcx = 8; Shift left rcx;
    andl    $3735928559, %eax  // value of eax = 1; Value AND 1 = 1;
    orq     %rax, %rsi         // value of rsi = 1; 1 OR 0 = 1;
    cmpl    $22, %edx          // edx != 22 
    jne     .L2                // if true, go back to .L2 (loop again)
    movl    $.LC0, %edi        // Point to string
    xorl    %eax, %eax         // value of eax = 0;
    jmp     printf             // print
.LFE13: ret                    // return

And I am supposed to turn it into the following C code with the blanks filled in

#include <stdio.h>
int main()
{
 long x = 0x________;
 long result = ______;
 long mask;
 for (mask = _________; mask _______; mask = ________) {
   result |= ________;
}
 printf("result %lx\n",result);
}

I have a couple of questions and sanity checks that I want to make sure I am getting right since none of the similar examples I have found are for optimized code. Upon compiling some trials myself I get something close but the middle part of L2 is always off.

MY UNDERSTANDING

At the beginning, esi is xor'd with itself, resulting in 0 which is represented by x . 1 is then added to ecx, which would be represented by the variable result .

x = 0;     result = 1;

Then, I believe a loop increment variable is stored in edx and set to 0. This will be used in the third part of the for loop (update expression). I also think that this variable must be mask, because later on 1 is added to edx, signifying a loop increment (mask = mask++), along with edx being compared in the middle part of the for loop (test expression aka mask != 22).

mask = 0; (in a way)

The loop is then entered, with rax being set to 1. I don't understand where this is used at all since there is no fourth variable I have declared, although it shows up later to be anded and zeroed out .

movq %rcx, %rax;

The loop variable is then incremented by one

addl $1, %edx;

THE NEXT PART MAKES THE LEAST AMOUNT OF SENSE TO ME

The next three operations I feel make up the body expression of the loop, however I have no idea what to do with them. It would result in something similar to result |= x ... but I don't know what else

salq    $3, %rcx      
andl    $3735928559, %eax  
orq     %rax, %rsi

The rest I feel I have a good grasp on. A comparison is made ( if mask != 22, loop again), and the results are printed.

PROBLEMS I AM HAVING I don't understand a couple of things.

1) I don't understand how to figure out my variables. There seem to be 3 hardcoded ones along with one increment or temporary storage variable that is found in the assembly (rax, rcx, rdx, rsi). I think rsi would be the x , and rcx would be result , yet I am unsure of if mask would be rdx or rax, and either way, what would the last variable be?

2) What do the 3 expressions of which I am unsure of do? I feel that I have them mixed up with the incrementation somehow, but without knowing the variables I don't know how to go about solving this.

Any and all help will be great, thank you!

Answer 1

The answer is :

#include <stdio.h>
int main()
{
    long x = 0xDEADBEEF;
    long result = 0;
    long mask;
    for (mask = 1; mask != 0; mask = mask << 3) {
        result |= mask & x;
    }
    printf("result %lx\n",result);
}

In the assembly :

rsi is result . We deduce that because it is the only value that get OR ed, and it is the second argument of the printf (In x64 linux, arguments are stored in rdi , rsi , rdx , and some others, in order).

x is a constant that is set to 0xDEADBEEF . This is not deductible for sure, but it makes sense because it seems to be set as a constant in the C code, and doesn't seem to be set after that.

Now for the rest, it is obfuscated by an anti-optimization by GCC. You see, GCC detected that the loop would be executed exactly 21 times, and thought is was clever to mangle the condition and replace it by a useless counter. Knowing that, we see that edx is the useless counter, and rcx is mask . We can then deduce the real condition and the real "increment" operation. We can see the <<= 3 in the assembly, and notice that if you shift left a 64-bit int 22 times, it becomes 0 ( shift 3, 22 times means shift 66 bits, so it is all shifted out).

This anti-optimization is sadly really common for GCC. The assembly can be replaced with :

.LFB13: 
    xorl    %esi, %esi
    movl    $1, %ecx
.L2:
    movq    %rcx, %rax
    andl    $3735928559, %eax
    orq     %rax, %rsi
    salq    $3, %rcx // implicit test for 0
    jne     .L2
    movl    $.LC0, %edi
    xorl    %eax, %eax
    jmp     printf

It does exactly the same thing, but we removed the useless counter and saved 3 assembly instructions. It also matches the C code better.

Answer 2

Let's work backwards a bit. We know that result must be the second argument to printf() . In the x86_64 calling convention, that's %rsi . The loop is everything between the .L2 label and the jne .L2 instruction. We see in the template that there's a result |= line at the end of the loop, and indeed, there's an orl instruction there with %rsi as its target, so that checks out. We can now see what it's initialized to at the top of .main .

ElderBug is correct that the compiler spuriously optimized by adding a counter. But we can still figure out: which instruction runs immediately after the |= when the loop repeats? That must be the third part of the loop. What runs immediately before the body of the loop? That must be the loop initialization. Unfortunately, you'll have to figure out what would have happened on the 22nd iteration of the original loop to reverse-engineer the loop condition. (But sal is a left-shift, and that line is a vestige of the original loop condition, which would have been followed by a conditional branch before the %rdx test was inserted.)

Note that the code keeps a copy of the value of mask around in %rcx before modifying it in %rax , and x is folded into a constant (take a close look at the andl line).

Also note that you can feed the .S file to gas to get a .o and see what it does.

Translating O2 optimized for-loop from assembly to C

Question

2 answers

solution1
1 ACCPTED 2015-09-18 09:06:44

solution2
1 2015-09-18 10:14:32

Translating O2 optimized for-loop from assembly to C

Question

2 answers

solution1 1 ACCPTED 2015-09-18 09:06:44

solution2 1 2015-09-18 10:14:32

solution1
1 ACCPTED 2015-09-18 09:06:44

solution2
1 2015-09-18 10:14:32