简体   繁体   English

如何从C源和asm输出反向设计结构细节?

[英]How to reverse engineer struct details from C source and asm output?

I'm trying to understand the solution to this problem : 我试图理解这个问题的解决方案

Given the C code below, and the asm output from the compiler, what are A and B ? 鉴于下面的C代码和编译器的asm输出, AB什么?

Answer: A is 5, B is 6. 答案: A是5, B是6。

I am guessing there has to be some sort of division done, because 96 and 48 are both divisible by 6 and 20 is divisible by 5. 我猜测必须进行某种划分,因为96和48都可以被6整除,20可以被5整除。

EDIT: I found this explanation for the answer online. 编辑: 我在网上找到了答案。 However I am not sure if it is accurate 但是我不确定它是否准确
" a char starts at any BYTE “一个字母从任何一个BYTE开始

a short starts only at EVEN bytes 短路仅在偶数字节处开始

an int starts at BYTE, but divisible by 4 一个int从BYTE开始,但可被4整除

a long starts at BYTE which is divisible by 8 很快就开始了BYTE,可以被8整除

str1.w is long which starts at 5 to 8 str1.w很长,从5到8开始

str1.x may have 184 or 180 str1.x可能有184或180

str2.p is int starts at the value 8, hence str1.array which holds from 5 to 8 BYTES str2.p是int从值8开始,因此str1.array保持5到8个BYTES

str2.q short may be 14 to 20 str2.q短期可能是14到20

str2.z may be 32 str2.z可能是32

char w[A][B] and int X char w [A] [B]和int X.

8 184 8 184

Str2. STR2。

short[B] int p doublez[B] short q short [B] int p doublez [B] short q

20 4 8 9 20 4 8 9

hence the value of A=5 and B=6" 因此A = 5和B = 6“的值

Code below: 代码如下:

// #define A  ??   // 5
// #define B  ??   // 6, but the question is how to figure that out from the asm
typedef struct {
    char w[A][B];
    int x;
} str1;

typedef struct {
    short y[B];
    int p;
    double z[B];
    short q; 
} str2;

void doSub(str1 *t, str2 *u) {
    int v1 = u->p;
    int v2 = u->q;
    t->x = v1-v2;
}

Assembly code generated for doSub procedure: 为doSub过程生成的汇编代码:

# t in %rdi, u in %rsi
doSub:
    movswl   96(%rsi), %edx
    movl     20(%rsi), %eax
    subl     %edx, %eax
    movl     %eax, 48(%rdi)
    ret

The assembly code tells you the offsets of the fields used in the C code. 汇编代码告诉您C代码中使用的字段的偏移量。 So from that, you can tell 所以,你可以说

offsetof(str1, x) == 48
offsetof(str2, p) == 20
offsetof(str2, q) == 96

Now lets look at p . 现在让我们来看看p It comes after y and sizeof(short) is probably 2 (unless this is a pretty unusual machine or compiler), so that tells us B*2 + padding == 20 . 它来自ysizeof(short)可能是2(除非这是一个非常不寻常的机器或编译器),所以这告诉我们B*2 + padding == 20 So B is at most 10, and probably not 8 or less. 所以B最多为10,可能不是8或更少。

Looking at q , sizeof(double) is probably 8 (again, unless unusual), so 20 + sizeof(int) + 8*B + padding == 96 . qsizeof(double)可能是8(再次,除非异常),所以20 + sizeof(int) + 8*B + padding == 96 If sizeof(int) == 4 (common, though different sizes for int are more common than for short/double), that gives us 8*B + padding == 72 . 如果sizeof(int) == 4 (常见,虽然int的不同大小比short / double更常见),这给了我们8*B + padding == 72 So B is at most 9. Since short probably has less restrictive alignment than double , there's probably no padding, giving B==9 , consistent with 2 bytes of padding before p 所以B最多为9.因为short可能比double更少限制,所以可能没有填充,给出B==9 ,与p之前的2个字节填充一致

Looking at str , sizeof(char) == 1 (always), so A*9 + padding = 48 . 看看strsizeof(char) == 1 (总是),所以A*9 + padding = 48 So the most likely value for A is 5, with 3 bytes of padding. 所以A的最可能值是5,有3个字节的填充。

Of course, the compiler is free to add any padding it wants, so any smaller values for A and B are possible, though wasteful. 当然,编译器可以自由添加它想要的任何填充,因此AB任何较小值都是可能的,尽管是浪费。

The asm is clearly for the AMD64 SysV ABI (more links in the tag wiki). asm显然适用于AMD64 SysV ABI标签wiki中的更多链接)。 I conclude that from it being x86-64 code with the first two args in %rdi , %rsi . 我的结论是,它是x86-64代码,前两个参数是%rdi%rsi The alignment rules given in the answer you found do match the ABI's rules for struct layout: Those types have their natural alignments. 您找到的答案中给出的对齐规则与ABI的结构布局规则相匹配:这些类型具有自然对齐方式。 (n-byte types are n-byte aligned, except for 10B long double (x87 format) which is 16B-aligned). (n字节类型是n字节对齐的,除了10B long double(x87格式),它是16B对齐的)。


The answer you found doesn't match your C and asm , so the A and B values are different. 您找到的答案与您的C和asm不匹配 ,因此A和B值不同。 Sorry I didn't check this while tidying up the question, I just assumed since it's trivial to check the answer with a compiler. 对不起,我在整理问题时没有检查这个,我只是假设,因为用编译器检查答案是微不足道的。

The SO answer you found does indeed have different structs and different asm output, so any similarity in the numeric solution is just a coincidence. 您找到的SO答案确实具有不同的结构和不同的asm输出,因此数值解决方案中的任何相似性只是巧合。 Nice work @MichaelPetch for finding the original source (and copying the markdown with formatting into the question). 很好的工作@MichaelPetch找到原始来源(并将格式化的标记复制到问题中)。


The following code produces identical asm to what your actual problem, with gcc 5.3 -O3 on the godbolt compiler explorer : 以下代码使用godbolt编译器资源管理器上的gcc 5.3 -O3生成与实际问题完全相同的asm:

#define A  5
#define B  9
typedef struct {
    char w[A][B];      // stored from 0 to A*B - 1
    int x;             // offset = 48 = A*B padded to a 4B boundary
} str1;

typedef struct {
    short y[B];        // 2*B bytes
    int p;             // offset = 20 = 2*B rounded up to a 4byte boundary
    double z[B];       // starts at 24 (20+4, already 8byte aligned), ends at 24 + 8*B - 1
    short q;           // offset = 96 = 24 + 8 * B
} str2;

void doSub(str1 *t, str2 *u) {
    int v1 = u->p;
    int v2 = u->q;
    t->x = v1-v2;
}

I added in what we know from the asm as comments on the structs. 我在asm中添加了我们对结构的评论。

  • str2 only depends on B, and has no ambiguity, so we can solve for B before worrying about A : str2只依赖于B,并且没有歧义,所以我们可以在担心A之前解决B

    96 = 24 + 8 * B
    72 = 8 * B
    72/8 = 9 = B

  • Once we have B , str1 gives us A : 一旦我们有了Bstr1给了我们A

    48 = align4(A*B) = align4(A*9)
    45 <= A*9 <= 48
    5 <= A <= 5.333
    Only one integer solution: A == 5 只有一个整数解: A == 5

Although honestly it was faster to solve by trial and error, since the compiler explorer site re-compiles automatically after any change. 虽然老实说,通过反复试验解决问题的速度更快,因为编译器资源管理器网站会在任何更改后自动重新编译。 It was easy to iterate towards the right value for B to produce the 96 and 20 offsets. 很容易迭代到B的正确值,以产生96和20个偏移。

Your A was already correct, but homing in on that would have been easy, since the problem was separable. 你的A已经是正确的了,但由于这个问题是可以分离的,所以很容易就可以了。 There was never a 2 simultaneous equations in 2 unknowns situation. 在2个未知情形中,从未有过2个联立方程。


This is where the "solution" starts to wander off track. 这就是“解决方案”开始偏离轨道的地方。 Are you sure it was a solution to the exact same problem you posted? 您确定它是您发布的完全相同问题的解决方案吗?

str1.w is long which starts at 5 to 8 str1.w很长,从5到8开始
str1.x may have 184 or 180 str1.x可能有184或180

str1.w in the code you posted is a 2-dimensional array of char , and starts at the beginning of the struct. 你发布的代码中的str1.w是一个二维char数组,从结构的开头开始。

str1.x starts at 48 bytes into str1 , as we can see from the asm. str1.x从48字节开始进入str1 ,正如我们从asm中看到的那样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM