从汇编中逆向工程C源代码

Question

I would like to know if anyone can help me out with a problem I am having when studying one of the lecture slides from an introductory assembly class that I am taking in school. 我想知道是否有人可以帮助我解决我在学校参加的入门讲习班中的一个讲座幻灯片时遇到的问题。 The problem I am having is not understanding the assembly, it is how exactly the C source code is ordered based on the assembly. 我遇到的问题是不了解程序集，它是如何根据程序集对C源代码进行排序的。 I will post the snippet I am talking about and maybe it will be clearer what I am talking about. 我将发布我正在讨论的片段，也许我会更清楚地谈论我的内容。

C Source given: C来源：

int arith(int x, int y, int z)
{ 
   int t1 = x+y;
   int t2 = z+t1;
   int t3 = x+4;
   int t4 = y * 48; 
   int t5 = t3 + t4;
   int rval = t2 * t5;
   return rval;
}

Assembly given: 大会给出：

arith:
pushl %ebp
movl %esp,%ebp

movl 8(%ebp),%eax
movl 12(%ebp),%edx
leal (%edx,%eax),%ecx
leal (%edx,%edx,2),%edx
sall $4,%edx
addl 16(%ebp),%ecx
leal 4(%edx,%eax),%eax
imull %ecx,%eax

movl %ebp,%esp
popl %ebp
ret

I am just confused as to how I am supposed to be able to discern for example that the adding of z + t1 ( z + x + y ) is listed on the second line(in the source) when in the assembly it comes after the y * 48 in the assembly code or for example that x + 4 is the 3rd line when in the assembly it is not even in a line by itself, its sort of mixed in with the last leal statement. 我只是很困惑，我应该能够辨别出例如在第二行（在源代码中）中添加z + t1 （ z + x + y ）时，在汇编之后它出现在汇编代码中的y * 48或者例如x + 4是汇编时的第3行，它本身甚至不是一行，它与最后一个leal语句混合在一起。 It makes sense to me when I have the source but I am supposed to be able to reproduce the source for a test and I do understand that the compiler optimizes things but if anyone has a way of thinking about the reverse engineering that could help me out I would greatly appreciate it if they could walk me through their thought process. 当我有源代码时对我有意义但是我应该能够重现测试的源代码并且我确实理解编译器优化了一些东西但是如果有人有办法考虑可以帮助我的逆向工程如果他们能够引导我完成思考过程，我将不胜感激。

Thanks. 谢谢。

Answer 1

I've broken down the disassembly for you to show how the assembly was produced from the C source. 我已经分解了反汇编，以显示如何从C源生成程序集。

8(%ebp) = x , 12(%ebp) = y , 16(%ebp) = z 8(%ebp) = x ， 12(%ebp) = y ， 16(%ebp) = z

arith:

Create the stack frame: 创建堆栈框架：

pushl %ebp
movl %esp,%ebp

Move x into eax , y into edx : 将x移动到eax ， y转换为edx ：

 movl 8(%ebp),%eax movl 12(%ebp),%edx

t1 = x + y . t1 = x + y 。 leal (Load effective address) will add edx and eax , and t1 will be in ecx : leal （加载有效地址）将添加edx和eax ， t1将在ecx ：

 leal (%edx,%eax),%ecx

int t4 = y * 48; in two steps below, multiply by 3, then by 16. t4 will eventually be in edx : 在下面的两个步骤中，乘以3，再乘以16. t4最终将在edx ：

Multiply edx by 2, and add edx to the result, ie. 将edx乘以2，并将edx添加到结果中，即。 edx = edx * 3 : edx = edx * 3 ：

 leal (%edx,%edx,2),%edx

Shift left 4 bits, ie. 向左移4位，即。 multiply by 16: 乘以16：

 sall $4,%edx

int t2 = z+t1; . 。 ecx initially holds t1 , z is at 16(%ebp) , at the end of the instruction ecx will be holding t2 : ecx 最初持有t1 ， z为16(%ebp) ，在指令结束时ecx将持有t2 ：

 addl 16(%ebp),%ecx

int t5 = t3 + t4; . 。 t3 was simply x + 4 , and rather than calculating and storing t3 , the expression of t3 is placed inline. t3只是x + 4 ，而非计算并存储t3 ，表达t3被内嵌放置。 This instruction essential does (x+4) + t4 , which is the same as t3 + t4 . 该指令必不可少(x+4) + t4 ，与t3 + t4相同。 It adds edx ( t4 ) and eax ( x ), and adds 4 as an offset to achieve that result. 它添加了edx （ t4 ）和eax （ x ），并添加了4作为偏移量来实现该结果。

 leal 4(%edx,%eax),%eax

int rval = t2 * t5; Fairly straight-forward this one; 相当直截了当; ecx represents t2 and eax represents t5 . ecx代表t2 ， eax代表t5 。 The return value is passed back to the caller through eax . 返回值通过eax传递回调用者。

 imull %ecx,%eax

Destroy the stack frame and restore esp and ebp : 销毁堆栈帧并恢复esp和ebp ：

 movl %ebp,%esp popl %ebp

Return from the routine: 从例行程序返回：

ret

From this example you can see that the result is the same, but the structure is a bit different. 从这个例子中你可以看到结果是一样的，但结构有点不同。 Most likely this code was compiled with some sort of optimization or someone wrote it themself like that to demonstrate a point. 很可能这个代码是通过某种优化编译的，或者有人自己编写这样的代码来证明这一点。

As others have said, you can't go exactly back to the source from the disassembly. 正如其他人所说，你无法从反汇编中完全回到源头。 It's up to the interpretation of the person reading the assembly to come up with equivalent C code. 这取决于阅读程序集的人的解释，以提出等效的C代码。

To help with learning assembly and understanding the disassembly of your C programs, you can do the following on Linux: 为了帮助学习汇编和理解C程序的反汇编，您可以在Linux上执行以下操作：

Compile with debug information ( -g ), which will embed the source: 编译调试信息（ -g ），它将嵌入源：

 gcc -c -g arith.c

If you're on a 64-bit machine, you can tell the compiler to create a 32-bit binary with the -m32 flag (I did so for the example below). 如果您使用的是64位计算机，则可以告诉编译器使用-m32标志创建一个32位二进制文件（我在下面的示例中这样做了）。

Use objdump to dump the object file with it's source interleaved: 使用objdump转储目标文件，其源交错：

gcc -c -g arith.c

-d = disassembly, -S = display source. -d =反汇编， -S =显示源。 You can add -M intel-mnemonic to use the Intel ASM syntax if you prefer that over the AT&T syntax that your example uses. 您可以添加-M intel-mnemonic以使用Intel ASM语法，如果您更喜欢使用您的示例使用的AT＆T语法。

Output: 输出：

objdump -d -S arith.o

As you can see, without optimizations the compiler produces a larger binary than the example you have. 如您所见，没有优化，编译器会生成比您拥有的示例更大的二进制文件。 You can play around with that and add a compiler optimization flag when compiling (ie. -O1 , -O2 , -O3 ). 您可以使用它并在编译时添加编译器优化标志（即-O1 ， -O2 ， -O3 ）。 The higher the optimization level, the more abstract the disassembly's going to seem. 优化级别越高，反汇编看起来就越抽象。

For example, with just level 1 optimization ( gcc -c -g -O1 -m32 arith.c1 ), the assembly code produced is a lot shorter: 例如，只有1级优化（ gcc -c -g -O1 -m32 arith.c1 ），生成的汇编代码要短得多：

arith.o:     file format elf32-i386


Disassembly of section .text:

00000000 <arith>:
int arith(int x, int y, int z)
{ 
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 20                sub    $0x20,%esp
   int t1 = x+y;
   6:   8b 45 0c                mov    0xc(%ebp),%eax
   9:   8b 55 08                mov    0x8(%ebp),%edx
   c:   01 d0                   add    %edx,%eax
   e:   89 45 fc                mov    %eax,-0x4(%ebp)
   int t2 = z+t1;
  11:   8b 45 fc                mov    -0x4(%ebp),%eax
  14:   8b 55 10                mov    0x10(%ebp),%edx
  17:   01 d0                   add    %edx,%eax
  19:   89 45 f8                mov    %eax,-0x8(%ebp)
   int t3 = x+4;
  1c:   8b 45 08                mov    0x8(%ebp),%eax
  1f:   83 c0 04                add    $0x4,%eax
  22:   89 45 f4                mov    %eax,-0xc(%ebp)
   int t4 = y * 48; 
  25:   8b 55 0c                mov    0xc(%ebp),%edx
  28:   89 d0                   mov    %edx,%eax
  2a:   01 c0                   add    %eax,%eax
  2c:   01 d0                   add    %edx,%eax
  2e:   c1 e0 04                shl    $0x4,%eax
  31:   89 45 f0                mov    %eax,-0x10(%ebp)
   int t5 = t3 + t4;
  34:   8b 45 f0                mov    -0x10(%ebp),%eax
  37:   8b 55 f4                mov    -0xc(%ebp),%edx
  3a:   01 d0                   add    %edx,%eax
  3c:   89 45 ec                mov    %eax,-0x14(%ebp)
   int rval = t2 * t5;
  3f:   8b 45 f8                mov    -0x8(%ebp),%eax
  42:   0f af 45 ec             imul   -0x14(%ebp),%eax
  46:   89 45 e8                mov    %eax,-0x18(%ebp)
   return rval;
  49:   8b 45 e8                mov    -0x18(%ebp),%eax
}
  4c:   c9                      leave  
  4d:   c3                      ret

Answer 2

You can't reproduce the original source, you can only reproduce an equivalent source. 您无法重现原始来源，您只能重现等效来源。

In your case the calculation for t2 can appear anywhere after t1 and before retval . 在您的情况下， t2的计算可以出现在t1之后和retval之前的任何地方。

The source might even have been a single expression: 源可能只是一个表达式：

return (x+y+z) * ((x+4) + (y * 48));

Answer 3

When reverse engineering, you don't care about the original source code line by line, you care about what it does. 逆向工程时，你不关心原始源代码，你关心它的作用。 A side effect is that you see what the code does, not what the programmer intended the code to do. 副作用是你看到代码的作用，而不是程序员想要的代码。

Answer 4

反编译并不是完全可以实现的：当从源代码（其中注释和名称给出了原始程序员的意图的线索）到二进制机器代码（其中指令将由处理器执行）时，存在一些知识损失。

从汇编中逆向工程C源代码

问题描述

4 个解决方案

解决方案1
9 已采纳 2011-11-13 22:07:28

解决方案2
6 2011-11-13 18:04:43

解决方案3
5 2011-11-13 18:34:22

解决方案4
1 2011-11-13 18:09:04

从汇编中逆向工程C源代码

问题描述

4 个解决方案

解决方案1 9 已采纳 2011-11-13 22:07:28

解决方案2 6 2011-11-13 18:04:43

解决方案3 5 2011-11-13 18:34:22

解决方案4 1 2011-11-13 18:09:04

解决方案1
9 已采纳 2011-11-13 22:07:28

解决方案2
6 2011-11-13 18:04:43

解决方案3
5 2011-11-13 18:34:22

解决方案4
1 2011-11-13 18:09:04