简体   繁体   English

程序在内存中的外观如何?

[英]How does a program look in memory?

How is a program (eg C or C++) arranged in computer memory? 程序(例如C或C ++)如何安排在计算机内存中? I kind of know a little about segments, variables etc, but basically I have no solid understanding of the entire structure. 我对段,变量等有点了解,但基本上我对整个结构没有扎实的了解。

Since the in-memory structure may differ, let's assume a C++ console application on Windows. 由于内存中的结构可能不同,我们假设Windows上有一个C ++控制台应用程序。

Some pointers to what I'm after specifically: 一些指向我具体的指示:

  • Outline of a function, and how is it called? 功能概要,如何调用?
  • Each function has a stack frame, what does that contain and how is it arranged in memory? 每个函数都有一个堆栈框架,它包含什么以及它如何安排在内存中?
  • Function arguments and return values 函数参数和返回值
  • Global and local variables? 全局变量和局部变量?
  • const static variables? const静态变量?
  • Thread local storage.. 线程本地存储..

Links to tutorial-like material and such is welcome, but please no reference-style material assuming knowledge of assembler etc. 类似教程的材料的链接是受欢迎的,但请不要参考样式材料,假设汇编程序等知识。

Might this be what you are looking for: 这可能是你要找的:

http://en.wikipedia.org/wiki/Portable_Executable http://en.wikipedia.org/wiki/Portable_Executable

The PE file format is the binary file structure of windows binaries (.exe, .dll etc). PE文件格式是Windows二进制文件(.exe,.dll等)的二进制文件结构。 Basically, they are mapped into memory like that. 基本上,它们被映射到内存中。 More details are described here with an explanation how you yourself can take a look at the binary representation of loaded dlls in memory: 这里描述了更多细节,并解释了如何自己查看内存中加载的dll的二进制表示:

http://msdn.microsoft.com/en-us/magazine/cc301805.aspx http://msdn.microsoft.com/en-us/magazine/cc301805.aspx

Edit: 编辑:

Now I understand that you want to learn how source code relates to the binary code in the PE file. 现在我知道您想了解源代码如何与PE文件中的二进制代码相关。 That's a huge field. 这是一个巨大的领域。

First, you have to understand the basics about computer architecture which will involve learning the general basics of assembly code. 首先,您必须了解计算机体系结构的基础知识,这将涉及学习汇编代码的一般基础知识。 Any "Introduction to Computer Architecture" college course will do. 任何“计算机体系结构简介”大学课程都可以。 Literature includes eg "John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach" or "Andrew Tanenbaum, Structured Computer Organization". 文献包括例如“John L.Hennessy和David A.Patterson。计算机体系结构:定量方法”或“Andrew Tanenbaum,结构化计算机组织”。

After reading this, you should understand what a stack is and its difference to the heap. 阅读本文后,您应该了解堆栈是什么以及它与堆的区别。 What the stack-pointer and the base pointer are and what the return address is, how many registers there are etc. 堆栈指针和基指针是什么,返回地址是什么,有多少寄存器等。

Once you've understood this, it is relatively easy to put the pieces together: 一旦你理解了这一点,将各个部分拼凑起来相对容易:

A C++ object contains code and data, ie, member variables. C ++对象包含代码和数据,即成员变量。 A class 一类

class SimpleClass {
     int m_nInteger;
     double m_fDouble;

     double SomeFunction() { return m_nInteger + m_fDouble; }
}

will be 4 + 8 consecutives bytes in memory. 将是内存中4 + 8个连续字节。 What happens when you do: 当你这样做时会发生什么:

SimpleClass c1;
c1.m_nInteger = 1;
c1.m_fDouble = 5.0;
c1.SomeFunction();

First, object c1 is created on the stack, ie, the stack pointer esp is decreased by 12 bytes to make room. 首先,在堆栈上创建对象c1,即堆栈指针esp减少12个字节以腾出空间。 Then constant "1" is written to memory address esp-12 and constant "5.0" is written to esp-8. 然后将常数“1”写入存储器地址esp-12,并将常数“5.0”写入esp-8。

Then we call a function that means two things. 然后我们调用一个意味着两件事的函数。

  1. The computer has to load the part of the binary PE file into memory that contains function SomeFunction(). 计算机必须将二进制PE文件的一部分加载到包含函数SomeFunction()的内存中。 SomeFunction will only be in memory once, no matter how many instances of SimpleClass you create. 无论您创建多少个SimpleClass实例,SomeFunction都只会在内存中一次。

  2. The computer has to execute function SomeFunction(). 计算机必须执行SomeFunction()函数。 That means several things: 这意味着几件事:

    1. Calling the function also implies passing all parameters, often this is done on the stack. 调用该函数还意味着传递所有参数,通常这是在堆栈上完成的。 SomeFunction has one (!) parameter, the this pointer, ie, the pointer to the memory address on the stack where we have just written the values "1" and "5.0" SomeFunction有一个(!)参数,这个指针,即指向堆栈上的内存地址的指针,我们刚刚写了值“1”和“5.0”
    2. Save the current program state, ie, the current instruction address which is the code address that will be executed if SomeFunction returns. 保存当前程序状态,即当前指令地址,即SomeFunction返回时将执行的代码地址。 Calling a function means pushing the return address on the stack and setting the instruction pointer (register eip) to the address of the function SomeFunction. 调用函数意味着按下堆栈上的返回地址并将指令指针(寄存器eip)设置为函数SomeFunction的地址。
    3. Inside function SomeFunction, the old stack is saved by storing the old base pointer (ebp) on the stack (push ebp) and making the stack pointer the new base pointer (mov ebp, esp). 在函数SomeFunction中,通过将旧的基指针(ebp)存储在堆栈上(push ebp)并使堆栈指针成为新的基指针(mov ebp,esp)来保存旧堆栈。
    4. The actual binary code of SomeFunction is executed which will call the machine instruction that converts m_nInteger to a double and adds it to m_fDouble. 执行SomeFunction的实际二进制代码,它将调用将m_nInteger转换为double的机器指令并将其添加到m_fDouble。 m_nInteger and m_fDouble are found on the stack, at ebp - x bytes. m_nInteger和m_fDouble位于堆栈上,位于ebp-x字节。
    5. The result of the addition is stored in a register and the function returns. 添加的结果存储在寄存器中,函数返回。 That means the stack is discarded which means the stack pointer is set back to the base pointer. 这意味着堆栈被丢弃,这意味着堆栈指针被设置回基本指针。 The base pointer is set back (next value on the stack) and then the instruction pointer is set to the return address (again next value on the stack). 基指针被设置回(堆栈上的下一个值),然后指令指针被设置为返回地址(堆栈上的下一个值)。 Now we're back in the original state but in some register lurks the result of the SomeFunction(). 现在我们又回到原始状态,但在某些寄存器中潜伏着SomeFunction()的结果。

I suggest, you build yourself such a simple example and step through the disassembly. 我建议,你自己构建这样一个简单的例子并逐步完成反汇编。 In debug build the code will be easy to understand and Visual Studio displays variable names in the disassembly view. 在调试版本中,代码将易于理解,Visual Studio在反汇编视图中显示变量名称。 See what the registers esp, ebp and eip do, where in memory your object is allocated, where the code is etc. 查看寄存器esp,ebp和eip的作用,内存在哪里分配对象,代码在哪里等等。

What a huge question! 真是个大问题!

First you want to learn about virtual memory . 首先,您想了解虚拟内存 Without that, nothing else will make sense. 没有它,没有别的意义。 In short, C/C++ pointers are not physical memory addresses. 简而言之,C / C ++指针不是物理内存地址。 Pointers are virtual addresses. 指针是虚拟地址。 There's a special CPU feature (the MMU, memory management unit) that transparently maps them to physical memory. 有一个特殊的CPU功能(MMU,内存管理单元)透明地将它们映射到物理内存。 Only the operating system is allowed to configure the MMU. 仅允许操作系统配置MMU。

This provides safety (there is no C/C++ pointer value you can possibly make that points into another process's virtual address space, unless that process is intentionally sharing memory with you) and lets the OS do some really magical things that we now take for granted (like transparently swap some of a process's memory to disk, then transparently load it back when the process tries to use it). 这提供了安全性(没有C / C ++指针值可以指向另一个进程的虚拟地址空间,除非该进程有意与您共享内存)并让操作系统做一些我们现在认为理所当然的神奇事物(比如透明地将一些进程的内存交换到磁盘,然后在进程尝试使用它时透明地加载它)。

A process's address space (aka virtual address space, aka addressable memory) contains: 进程的地址空间(也称为虚拟地址空间,又称可寻址内存)包含:

  • a huge region of memory that's reserved for the Windows kernel, which the process isn't allowed to touch; 为Windows内核保留的巨大内存区域,不允许该进程触及;

  • regions of virtual memory that are "unmapped", ie nothing is loaded there, there's no physical memory assigned to those addresses, and the process will crash if it tries to access them; “未映射”的虚拟内存区域,即没有加载任何内容,没有物理内存分配给这些地址,如果尝试访问它们,进程将崩溃;

  • parts the various modules (EXE and DLL files) that have been loaded (each of these contains machine code, string constants, and other data); 部分已加载的各种模块(EXE和DLL文件)(每个模块包含机器代码,字符串常量和其他数据); and

  • whatever other memory the process has allocated from the system. 进程从系统分配的其他内存。

Now typically a process lets the C Runtime Library or the Win32 libraries do most of the super-low-level memory management, which includes setting up: 现在通常一个进程允许C运行时库或Win32库执行大多数超级低级内存管理,包括设置:

  • a stack (for each thread), where local variables and function arguments and return values are stored; 堆栈(对于每个线程),其中存储局部变量和函数参数以及返回值; and

  • a heap, where memory is allocated if the process calls malloc or does new X . 堆,如果进程调用malloc或执行new X ,则分配malloc

For more about the stack is structured, read about calling conventions . 有关堆栈结构的更多信息,请阅读有关调用约定的内容 For more about how the heap is structured, read about malloc implementations . 有关堆的结构的更多信息,请阅读malloc实现 In general the stack really is a stack, a last-in-first-out data structure, containing arguments, local variables, and the occasional temporary result, and not much more. 通常,堆栈实际上是一个堆栈,一个后进先出的数据结构,包含参数,局部变量和偶尔的临时结果,而不是更多。 Since it is easy for a program to write straight past the end of the stack (the common C/C++ bug after which this site is named), the system libraries typically make sure that there is an unmapped page adjacent to the stack. 由于程序很容易直接写入堆栈末尾(此站点命名后的常见C / C ++错误),因此系统库通常会确保堆栈旁边有一个未映射的页面。 This makes the process crash instantly when such a bug happens, so it's much easier to debug (and the process is killed before it can do any more damage). 当这样的错误发生时,这会使进程立即崩溃,因此调试起来要容易得多(并且进程在它可以造成更多损害之前被终止)。

The heap is not really a heap in the data structure sense. 堆在数据结构意义上并不是真正的堆。 It's a data structure maintained by the CRT or Win32 library that takes pages of memory from the operating system and parcels them out whenever the process requests small pieces of memory via malloc and friends. 它是由CRT或Win32库维护的数据结构,它从操作系统获取内存页,并在进程通过malloc和朋友请求小块内存时将它们包装出来。 (Note that the OS does not micromanage this; a process can to a large extent manage its address space however it wants, if it doesn't like the way the CRT does it.) (请注意,操作系统不会对此进行微观管理;如果它不喜欢CRT的方式,那么进程可以在很大程度上管理它想要的地址空间。)

A process can also request pages directly from the operating system, using an API like VirtualAlloc or MapViewOfFile . 进程还可以使用VirtualAllocMapViewOfFile等API直接从操作系统请求页面。

There's more, but I'd better stop! 还有更多,但我最好停下来!

For understanding stack frame structure you can refer to http://en.wikipedia.org/wiki/Call_stack 要了解堆栈框架结构,可以参考http://en.wikipedia.org/wiki/Call_stack

It gives you information about structure of call stack, how locals , globals , return address is stored on call stack 它为您提供有关调用堆栈结构的信息,本地,全局,返回地址如何存储在调用堆栈中

It might not be the most accurate information, but MS Press provides some sample chapters of of the book Inside Microsoft® Windows® 2000, Third Edition , containing information about processes and their creation along with images of some important data structures. 它可能不是最准确的信息,但MS Press提供了Microsoft®Windows®2000,第三版内容的一些示例章节,其中包含有关过程及其创建的信息以及一些重要数据结构的图像。

I also stumbled upon this PDF that summarizes some of the above information in an nice chart. 我也偶然发现了这个PDF ,它在一个漂亮的图表中总结了上面的一些信息。

But all the provided information is more from the OS point of view and not to much detailed about the application aspects. 但是所有提供的信息更多来自操作系统的观点,而不是关于应用程序方面的详细信息。

Actually - you won't get far in this matter with at least a little bit of knowledge in Assembler. 实际上 - 在这个问题上你至少可以在Assembler中获得一点点知识。 I'd recoomend a reversing (tutorial) site, eg OpenRCE.org. 我将重新建立一个倒转(教程)网站,例如OpenRCE.org。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM