简体繁体 English

为什么原生Win32难拆，.NET应用易拆？

[英]Why is difficult to disassemble native Win32, but easy to disassemble .NET app?

原文 2013-01-11 19:09:43 9 5 c#/ c++/ .net/ winapi/ native

Why is the process of disassembling a native Win32 image (built in C/C++ for eg) miles more difficult than disassembling a .NET app?为什么反汇编本机 Win32 映像（例如用 C/C++ 构建）的过程比反汇编 .NET 应用程序更困难？

What is the main reason?主要原因是什么？ Because of what?因为什么？

5 个解决方案

A .net assembly is built into Common Intermediate Language . .net 程序集内置于通用中间语言中。 It is not compiled until it is about to be executed, when the CLR compiles it to run on the appropriate system.当 CLR 编译它以在适当的系统上运行时，它才会被编译，直到它即将被执行。 The CIL has a lot of metadata so that it can be compiled onto different processor architectures and different operating systems (on Linux, using Mono). CIL 有很多元数据，因此它可以编译到不同的处理器架构和不同的操作系统上（在 Linux 上，使用 Mono）。 The classes and methods remain largely intact.类和方法基本保持不变。

.net also allows for reflection, which requires metadata to be stored in the binaries. .net 还允许反射，这需要将元数据存储在二进制文件中。

C and C++ code is compiled to the selected processor architecture and system when it is compiled. C 和 C++ 代码在编译时被编译到选定的处理器架构和系统。 An executable compiled for Windows will not work on Linux and vice versa.为 Windows 编译的可执行文件在 Linux 上不起作用，反之亦然。 The output of the C or C++ compiler is assembly instructions. C 或 C++ 编译器的输出是汇编指令。 The functions in the source code might not exist as functions in the binary, but be optimized in some way.源代码中的函数可能不作为二进制函数存在，但会以某种方式进行优化。 Compilers can also have quite agressive optimizers that will take logically structured code and make it look very different.编译器也可以有相当激进的优化器，它们将采用逻辑结构化的代码并使其看起来非常不同。 The code will be more efficient (in time or space), but can make it more difficult to reverse.代码将更有效率（在时间或空间上），但会使其更难逆转。

Due to the implementation of .NET allowing for interoperability between languages such as C#,VB, and even C/C++ through the CLI and CLR this means extra metadata has to be put into the object files to correctly transmit Class and object properties.由于 .NET 的实现允许通过 CLI 和 CLR 实现 C#、VB 甚至 C/C++ 等语言之间的互操作性，这意味着必须将额外的元数据放入对象文件中才能正确传输类和对象属性。 This makes it easier to disassemble since the binary objects still contain that information whereas C/C++ can throw that information away since it is not necessary (at least for the execution of the code, the information is still required at compile time of course).这使得反汇编更容易，因为二进制对象仍然包含该信息，而 C/C++ 可以丢弃该信息，因为它不是必需的（至少对于代码的执行，当然在编译时仍然需要该信息）。

This information is typically limited to class related fields and objects.此信息通常仅限于与类相关的字段和对象。 Variables allocated on the stack will probably not have annotations in a release build since their information is not needed for interoperability.分配在堆栈上的变量在发布版本中可能没有注释，因为互操作性不需要它们的信息。

One more reason - optimizations that most C++ compilers perform when producing final binaries are not performed on IL level for managed code.另一个原因 - 大多数 C++ 编译器在生成最终二进制文件时执行的优化不是在托管代码的 IL 级别执行的。

As result something like iteration over container would look like couple inc / jnc assembly instructions for native code compared with function calls with meaningful names in IL.因此，与 IL 中具有有意义名称的函数调用相比，类似容器上的迭代看起来像是本地代码的一对inc / jnc汇编指令。 Resulting executed code may be the same (or at least close) as JIT compiler will inline some calls similar to native compiler, but the code one can look at is much more readable in CLR land.生成的执行代码可能与 JIT 编译器将内联一些类似于本机编译器的调用相同（或至少接近），但可以查看的代码在 CLR 领域更具可读性。

People have mentioned some of the reasons;人们已经提到了一些原因； I'll mention another one, assuming we're talking about disassembling rather than decompiling .我会提到另一个，假设我们谈论的是反汇编而不是反编译。

The trouble with x86 code is that distinguishing between code and data is very difficult and error-prone. x86 代码的问题在于区分代码和数据非常困难且容易出错。 Disassemblers have to rely on guessing in order to get it right, and they almost always miss something;反汇编者必须依靠猜测才能把它弄对，而且他们几乎总是会漏掉一些东西； by contrast, intermediate languages are designed to be "disassembled" (so that the JIT compiler can turn the "disassembly" into machine code), so they don't contain ambiguities like you would find in machine code.相比之下，中间语言被设计为“反汇编”（以便 JIT 编译器可以将“反汇编”转换为机器代码），因此它们不包含您在机器代码中会发现的歧义。 The end result is that disassembly of IL code is quite trivial.最终结果是反汇编 IL 代码非常简单。

If you're talking about decompiling , that's a different matter;如果你在谈论反编译，那是另一回事； it has to do with the (mostly) lack of optimizations for .NET applications.它与（主要）缺乏对 .NET 应用程序的优化有关。 Most optimizations are done by the JIT compiler rather than the C#/VB.NET/etc.大多数优化是由 JIT 编译器完成的，而不是 C#/VB.NET/etc。 compiler, so the assembly code is almost a 1:1 match of the source code, so figuring out the original is quite possible.编译器，所以汇编代码几乎是源代码的 1:1 匹配，因此很可能找出原始代码。 But for native code, there's a million different ways to translate a handful of source lines (heck, even no-ops have a gazillion different ways of being written, with different performance characteristics!) so it's quite difficult to figure out what the original was.但是对于本机代码，有一百万种不同的方式来翻译少数源代码行（见鬼，即使是无操作的代码也有无数种不同的编写方式，具有不同的性能特征！）所以很难弄清楚原始代码是什么.

In general case there is no much difference between disassembling C++ and .NET code.在一般情况下，反汇编 C++ 和 .NET 代码之间没有太大区别。 Of cause C++ is harder to disassemble because it does more optimizations and stuff like that, but that's not the main issue.因为 C++ 更难反汇编，因为它做了更多的优化和类似的事情，但这不是主要问题。

The main issue is with names.主要问题是名称。 A disassembled C++ code will have everything named as A,B,C,D,...A1, and etc. Unless you could recognize an algorithm in such format, there is not much information you could extract from the disassembled C++ binary.反汇编的 C++ 代码会将所有内容命名为 A、B、C、D、...A1 等。除非您能识别这种格式的算法，否则您无法从反汇编的 C++ 二进制文件中提取出太多信息。

The .NET library on the other side contains in it names of methods, method parameters, class names, and class field names.另一方面，.NET 库中包含方法名称、方法参数、类名称和类字段名称。 It makes understanding of the disassembled code much easier.它使理解反汇编代码变得更加容易。 All other stuff is secondary.所有其他的东西都是次要的。

Besides something about metadata, debugging informations and all the technical reasons have pointed out by the other answers; 除了元数据之外，其他答案还指出了调试信息和所有技术原因; what I thought about is: 我的想法是：

The main reason you would think to disassembling win32 image is more difficult than .Net programs, is because of human perspective . 你认为拆解win32图像的主要原因比.Net程序更难，因为人类的观点 。

From the perspective of machine, native code is much more transparent, even of the processing of reverse engineering. 从机器的角度来看，本机代码更加透明，甚至是逆向工程的处理。

Oppositely, I'd like to say that to disassemble .Net applications/libraries CAN be more difficult, if the code has been obfuscated . 相反，我想说，拆解.Net应用程序/库会更加困难， 如果代码已经被混淆 。

You might think it's difficult to disassemble native win32 programs, is because of its nature is consisting of machine code. 您可能认为拆解本机win32程序很困难，因为它的本质是由机器代码组成。 But in fact, by a analogy of physical world and psychic, I think machine code is more like the physical one - it acts on what it actually does. 但实际上，通过物理世界和心理学的类比，我认为机器代码更像是物理代码 - 它会影响实际的功能。 Although reverse engineering of win32 programs could be very complex, the code is in the scope of instruction set of CPUs. 虽然win32程序的逆向工程可能非常复杂，但代码属于CPU指令集的范围。 The most complicated thing might be: 最复杂的可能是：

addressing 解决
memory/register accessing 内存/寄存器访问
hardware communications 硬件通讯
OS level technology (processing, swapping, paging, etc.) 操作系统级技术（处理，交换，分页等）

There are a count of obfuscators and de-obfuscators for .Net , implemented in different technics. .Net有一些混淆器和去混淆器，用不同的技术实现。 It is totally possible to make .Net applications much more difficult to be disassembled than win32 programs . 完全有可能使.Net应用程序比win32程序更难拆解 。 For the reason most of virtual machine based programs are easier to be disassembled I think there are following considerations of them not to be too obfuscated : 由于大多数基于虚拟机的程序更容易被反汇编，我认为有以下考虑因素不要过于混淆 ：

execution performance 执行表现
code optimizability 代码可优化性
maintainability 可维护性
cost considerations 成本考虑

If you've read the code of OpCodes of .Net framework, and you realize that there are more complicating concepts of language level and about OOP. 如果您已经阅读了.Net框架的OpCodes代码，并且您意识到语言级别和OOP的概念更复杂。 For example, with Reflection.Emit , you can emit the opcode of calling a constructor, method or virtual method. 例如，使用Reflection.Emit ，您可以发出调用构造函数，方法或虚方法的操作码。 Yes, it's based on MSIL(CIL) and runs by the CLR ; 是的，它基于MSIL(CIL)并由CLR运行; but that does not mean it is easier to be disassembled; 但这并不意味着拆卸起来更容易; it can be made in a obfuscated manner, and becomes much more difficult to be reversed to the source code; 它可以以混淆的方式制作，并且变得更难以与源代码相反; like the psychical world is always more impalpable than the physical world. 像精神世界总是比物质世界更难以理解。