How to reverse engineer struct details from C source and asm output?

Question

I'm trying to understand the solution to this problem :

Given the C code below, and the asm output from the compiler, what are A and B ?

Answer: A is 5, B is 6.

I am guessing there has to be some sort of division done, because 96 and 48 are both divisible by 6 and 20 is divisible by 5.

EDIT:
" a char starts at any BYTE

a short starts only at EVEN bytes

an int starts at BYTE, but divisible by 4

a long starts at BYTE which is divisible by 8

str1.w is long which starts at 5 to 8

str1.x may have 184 or 180

str2.p is int starts at the value 8, hence str1.array which holds from 5 to 8 BYTES

str2.q short may be 14 to 20

str2.z may be 32

char w[A][B] and int X

8 184

Str2.

short[B] int p doublez[B] short q

20 4 8 9

hence the value of A=5 and B=6"

// #define A  ??   // 5
// #define B  ??   // 6, but the question is how to figure that out from the asm
typedef struct {
    char w[A][B];
    int x;
} str1;

typedef struct {
    short y[B];
    int p;
    double z[B];
    short q; 
} str2;

void doSub(str1 *t, str2 *u) {
    int v1 = u->p;
    int v2 = u->q;
    t->x = v1-v2;
}

# t in %rdi, u in %rsi
doSub:
    movswl   96(%rsi), %edx
    movl     20(%rsi), %eax
    subl     %edx, %eax
    movl     %eax, 48(%rdi)
    ret

Answer 1

The assembly code tells you the offsets of the fields used in the C code. So from that, you can tell

offsetof(str1, x) == 48
offsetof(str2, p) == 20
offsetof(str2, q) == 96

Now lets look at p . It comes after y and sizeof(short) is probably 2 (unless this is a pretty unusual machine or compiler), so that tells us B*2 + padding == 20 . So B is at most 10, and probably not 8 or less.

Looking at q , sizeof(double) is probably 8 (again, unless unusual), so 20 + sizeof(int) + 8*B + padding == 96 . If sizeof(int) == 4 (common, though different sizes for int are more common than for short/double), that gives us 8*B + padding == 72 . So B is at most 9. Since short probably has less restrictive alignment than double , there's probably no padding, giving B==9 , consistent with 2 bytes of padding before p

Looking at str , sizeof(char) == 1 (always), so A*9 + padding = 48 . So the most likely value for A is 5, with 3 bytes of padding.

Of course, the compiler is free to add any padding it wants, so any smaller values for A and B are possible, though wasteful.

Answer 2

The asm is clearly for the AMD64 SysV ABI (more links in the x86 tag wiki). I conclude that from it being x86-64 code with the first two args in %rdi , %rsi . The alignment rules given in the answer you found do match the ABI's rules for struct layout: Those types have their natural alignments. (n-byte types are n-byte aligned, except for 10B long double (x87 format) which is 16B-aligned).

The answer you found doesn't match your C and asm , so the A and B values are different. Sorry I didn't check this while tidying up the question, I just assumed since it's trivial to check the answer with a compiler.

The SO answer you found does indeed have different structs and different asm output, so any similarity in the numeric solution is just a coincidence. Nice work @MichaelPetch for finding the original source (and copying the markdown with formatting into the question).

The following code produces identical asm to what your actual problem, with gcc 5.3 -O3 on the godbolt compiler explorer :

#define A  5
#define B  9
typedef struct {
    char w[A][B];      // stored from 0 to A*B - 1
    int x;             // offset = 48 = A*B padded to a 4B boundary
} str1;

typedef struct {
    short y[B];        // 2*B bytes
    int p;             // offset = 20 = 2*B rounded up to a 4byte boundary
    double z[B];       // starts at 24 (20+4, already 8byte aligned), ends at 24 + 8*B - 1
    short q;           // offset = 96 = 24 + 8 * B
} str2;

void doSub(str1 *t, str2 *u) {
    int v1 = u->p;
    int v2 = u->q;
    t->x = v1-v2;
}

I added in what we know from the asm as comments on the structs.

str2 only depends on B, and has no ambiguity, so we can solve for B before worrying about A :
96 = 24 + 8 * B
72 = 8 * B
72/8 = 9 = B
Once we have B , str1 gives us A :
48 = align4(A*B) = align4(A*9)
45 <= A*9 <= 48
5 <= A <= 5.333
Only one integer solution: A == 5

Although honestly it was faster to solve by trial and error, since the compiler explorer site re-compiles automatically after any change. It was easy to iterate towards the right value for B to produce the 96 and 20 offsets.

Your A was already correct, but homing in on that would have been easy, since the problem was separable. There was never a 2 simultaneous equations in 2 unknowns situation.

This is where the "solution" starts to wander off track. Are you sure it was a solution to the exact same problem you posted?

str1.w is long which starts at 5 to 8
str1.x may have 184 or 180

str1.w in the code you posted is a 2-dimensional array of char , and starts at the beginning of the struct.

str1.x starts at 48 bytes into str1 , as we can see from the asm.

How to reverse engineer struct details from C source and asm output?

Question

2 answers

solution1
8 2016-04-07 03:38:47

solution2
1 ACCPTED 2016-04-07 17:36:11

How to reverse engineer struct details from C source and asm output?

Question

2 answers

solution1 8 2016-04-07 03:38:47

solution2 1 ACCPTED 2016-04-07 17:36:11

solution1
8 2016-04-07 03:38:47

solution2
1 ACCPTED 2016-04-07 17:36:11