简体   繁体   English

堆,堆栈,文本等不同的段如何与物理内存相关?

[英]How are the different segments like heap, stack, text related to the physical memory?

  1. When a C program is compiled and the object file(ELF) is created. 编译C程序并创建目标文件(ELF)时。 the object file contains different sections such as bss, data, text and other segments. 目标文件包含不同的部分,如bss,数据,文本和其他段。 I understood that these sections of the ELF are part of virtual memory address space. 我知道ELF的这些部分是虚拟内存地址空间的一部分。 Am I right? 我对吗? Please correct me if I am wrong. 如果我错了,请纠正我。

  2. Also, there will be a virtual memory and page table associated with the compiled program. 此外,将存在与编译的程序相关联的虚拟存储器和页表。 Page table associates the virtual memory address present in ELF to the real physical memory address when loading the program. 页表在加载程序时将ELF中存在的虚拟内存地址与实际物理内存地址相关联。 Is my understanding correct? 我的理解是否正确?

  3. I read that in the created ELF file, bss sections just keeps the reference of the uninitialised global variables. 我在创建的ELF文件中读到,bss部分只保留未初始化的全局变量的引用。 Here uninitialised global variable means, the variables that are not intialised during declaration? 这里未初始化的全局变量是指在声明期间未初始化的变量?

  4. Also, I read that the local variables will be allocated space at run time (ie, in stack). 另外,我读到局部变量将在运行时(即堆栈中)分配空间。 Then how they will be referenced in the object file? 那么它们将如何在目标文件中引用?

  5. If in the program, there is particular section of code available to allocate memory dynamically. 如果在程序中,有特定的代码段可用于动态分配内存。 How these variables will be referenced in object file? 如何在目标文件中引用这些变量?

I am confused that these different segments of object file (like text, rodata, data, bss, stack and heap) are part of the physical memory (RAM), where all the programs are executed. 我很困惑,目标文件的这些不同部分(如文本,rodata,数据,bss,堆栈和堆)是物理内存(RAM)的一部分,所有程序都在其中执行。 But I feel that my understanding is wrong. 但我觉得我的理解是错误的。 How are these different segments related to the physical memory when a process or a program is in execution? 当进程或程序执行时,这些不同的段如何与物理内存相关?

1. Correct, the ELF file lays out the absolute or relative locations in the virtual address space of a process that the operating system should copy the ELF file contents into. 1.正确,ELF文件列出了操作系统应将ELF文件内容复制到的进程的虚拟地址空间中的绝对或相对位置。 (The bss is just a location and a size, since its supposed to be all zeros, there is no need to actually have the zeros in the ELF file). (bss只是一个位置和一个大小,因为它应该全为零,所以不需要在ELF文件中实际存在零)。 Note that locations can be absolute locations (like virtual address 0x100000 or relative locations like 4096 bytes after the end of text.) 请注意,位置可以是绝对位置(如虚拟地址0x100000或相对位置,如文本结尾后的4096字节)。

2. The virtual memory definition (which is kept in page tables and maps virtual addresses to physical addresses) is not associated with a compiled program, but with a "process" (or "task" or whatever your OS calls it) that represents a running instance of that program. 2.虚拟内存定义(保存在页表中并将虚拟地址映射到物理地址)与编译的程序无关,而是与“进程”(或“任务”或操作系统调用它的任何内容)相关联,表示运行该程序的实例 For example, a single ELF file can be loaded into two different processes, at different virtual addresses (if the ELF file is relocatable). 例如,可以将单个ELF文件加载到两个不同的进程中,位于不同的虚拟地址(如果ELF文件是可重定位的)。

3. The programming language you're using defines which uninitialized state goes in the bss, and which gets explicitly initialized. 3.您正在使用的编程语言定义哪个未初始化状态进入bss,并且显式初始化。 Note that the bss does not contain "references" to these variables, it is the storage backing those variables. 需要注意的是BSS 包含“引用”,以这些变量,它存储支持这些变量。

4. Stack variables are referenced implicitly from the generated code. 4.堆栈变量从生成的代码中隐式引用。 There is nothing explicit about them (or even the stack) in the ELF file. ELF文件中没有任何关于它们(甚至是堆栈)的明确说明。

5. Like stack references, heap references are implicit in the generated code in the ELF file. 5.与堆栈引用一样,堆引用隐含在ELF文件中生成的代码中。 (They're all stored in memory created by changing the virtual address space via a call to sbrk or its equivalent.) (它们都存储在通过调用sbrk或其等价物来更改虚拟地址空间而创建的内存中。)

The ELF file explains to an OS how to setup a virtual address space for an instance of a program. ELF文件向操作系统解释如何为程序实例设置虚拟地址空间。 The different sections describe different needs. 不同的部分描述了不同的需求。 For example ".rodata" says I'd like to store read-only data (as opposed to executable code). 例如“.rodata”说我想存储只读数据(而不是可执行代码)。 The ".text" section means executable code. “.text”部分表示可执行代码。 The "bss" is a region used to store state that should be zeroed by the OS. “bss”是用于存储应由OS归零的状态的区域。 The virtual address space means the program can (optionally) rely on things being where it expects when it starts up. 虚拟地址空间意味着程序可以(可选地)依赖于启动时所期望的位置。 (For example, if it asks for the .bss to be at address 0x4000, then either the OS will refuse to start it, or it will be there.) (例如,如果它要求.bss位于地址0x4000,则操作系统将拒绝启动它,否则它将在那里。)

Note that these virtual addresses are mapped to physical addresses by the page tables managed by the OS. 请注意,这些虚拟地址由OS管理的页表映射到物理地址。 The instance of the ELF file doesn't need to know any of the details involved in which physical pages are used. ELF文件的实例不需要知道使用哪些物理页面所涉及的任何细节。

I am not sure if 1, 2 and 3 are correct but I can explain 4 and 5. 我不确定1,2和3是否正确,但我可以解释4和5。

4 : They are referenced by offset from the top of the stack. 4 :它们是从堆栈顶部的偏移量引用的。 When executing a function, the top of the stack is increased to allocate space for local variables. 执行函数时,堆栈顶部会增加,以便为局部变量分配空间。 Compiler determines the order of local variables in the stack so the compiler nows what is the offset of the variables from the top of the stack. 编译器确定堆栈中局部变量的顺序,以便编译器指出变量从堆栈顶部的偏移量。

Stack in physical memory is positioned upside down. 物理内存中的堆栈颠倒放置。 Beginning of stack usually has highest memory address available. 堆栈的开头通常具有最高的可用内存地址。 As programs runs and allocates space for local variables the address of the top of the stack decrements (and can potentially lead to stack overflow - overlapping with segments on lower addresses :-) ) 当程序运行并为局部变量分配空间时,堆栈顶部的地址会减少(并且可能导致堆栈溢出 - 与较低地址上的段重叠:-))

5 : Using pointers - Address of dynamically allocated variable is stored in (local) variable. 5 :使用指针 - 动态分配变量的地址存储在(本地)变量中。 This corresponds to using pointers in C. 这对应于在C中使用指针。

I have found nice explanation here: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/mem.html 我在这里找到了很好的解释: http//www.ualberta.ca/CNS/RESEARCH/LinuxClusters/mem.html

Just do a man on the command readelf to find out the starting addresses of the different segments of your program. 只需在命令readelf上找一个人来找出程序不同部分的起始地址。

Regarding the first question you are absolutely right. 关于第一个问题,你是绝对正确的。 Since most of today's systems use run-time binding it is only during execution that the actual physical addresses are known. 由于今天的大多数系统都使用运行时绑定,因此只有在执行期间才能知道实际的物理地址。 Moreover, it's the compiler and the loader that divide the program into different segments after linking the different libraries during compile and load time. 而且,在编译和加载时间链接不同的库之后,编译器和加载器将程序划分为不同的段。 Hence, the virtual addresses. 因此,虚拟地址。

Coming to the second question it is at the run-time due to runtime binding. 由于运行时绑定,第二个问题是在运行时。 The third question is true. 第三个问题是对的。 All uninitialized global variables and static variables go into BSS. 所有未初始化的全局变量和静态变量都进入BSS。 Also note the special case: they go into BSS even if they are initialized to 0. 还要注意特殊情况:即使将它们初始化为0,它们也会进入BSS。

All the addresses of the different sections (.text, .bss, .data, etc.) you see when you inspect an ELF with the size command: 使用size命令检查ELF时看到的不同部分(.text,.bss,.data等)的所有地址:

$ size -A -x my_elf_binary

are virtual addresses. 是虚拟地址。 The MMU with the operating system performs the translation from the virtual addresses to the RAM physical addresses. 具有操作系统的MMU执行从虚拟地址到RAM物理地址的转换。

If you want to know these things, learn about the OS, with source code (www.kernel.org) if possible. 如果您想了解这些内容,请尽可能使用源代码(www.kernel.org)了解操作系统。
You need to realize that the OS kernel is actually running the CPU and managing the memory resource. 您需要意识到OS内核实际上正在运行CPU并管理内存资源。 And C code is just a light weight script to drive the OS and to run only simple operation with registers. C代码只是一个轻量级脚本,用于驱动操作系统并仅使用寄存器运行简单操作。

  1. Virtual memory and Physical memory is about CPU's TLB letting the user space process to use contiguous memory virtually through the power of TLB (using page table) hardware. 虚拟内存和物理内存是关于CPU的TLB,让用户空间进程通过TLB(使用页表)硬件的功能虚拟地使用连续内存。 So the actual physical memory, mapped to the contiguous virtual memory can be scattered to anywhere on the RAM. 因此,映射到连续虚拟内存的实际物理内存可以分散到RAM上的任何位置。 Compiled program doesn't know about this TLB stuff and physical memory address stuff. 编译程序不知道这个TLB的东西和物理内存地址的东西。 They are managed in the OS kernel space. 它们在OS内核空间中进行管理。

  2. BSS is a section which OS prepares as zero filled memory addresses, because they were not initialized in the c/c++ source code, thus marked as bss by the compiler/linker. BSS是OS准备为零填充内存地址的部分,因为它们未在c / c ++源代码中初始化,因此由编译器/链接器标记为bss。

  3. Stack is something prepared only a small amount of memory at first by the OS, and every time function call has been made, address will be pushed down, so that there is more space to place the local variables, and pop when you want to return from the function. 堆栈是操作系统最初只准备少量内存的东西,每次进行函数调用时,地址都会被按下,这样就有更多空间放置局部变量,当你想要返回时弹出从功能。 New physical memory will be allocated to the virtual address when the first small amount of memory is full and reached to the bottom, and page fault exception would occur, and the OS kernel will prepare a new physical memory and the user process can continue working. 当第一个少量内存已满并到达底部时,新的物理内存将被分配给虚拟地址,并且会发生页面错误异常,并且OS内核将准备新的物理内存并且用户进程可以继续工作。

  4. No magic. 没有魔法。 In object code, every operation done to the pointer returned from malloc is handled as offsets to the register value returned from malloc function call. 在目标代码中,对从malloc返回的指针所做的每个操作都被处理为从malloc函数调用返回的寄存器值的偏移量。

Actually malloc is doing quite complex things. 实际上malloc正在做很复杂的事情。 There are various implementations (jemalloc/ptmalloc/dlmalloc/googlemalloc/...) for improving dynamic allocations, but actually they are all getting new memory region from the OS using sbrk or mmap(/dev/zero), which is called anonymous memory. 有各种实现(jemalloc / ptmalloc / dlmalloc / googlemalloc / ...)用于改进动态分配,但实际上它们都是使用sbrk或mmap(/ dev / zero)从操作系统获取新内存区域,这称为匿名内存。

4. If you look at a assembler code generated by gcc you can see that memory local variables is allocated in stack through command push or through changing value of the register ESP . 4.如果查看gcc生成的汇编程序代码,可以看到内存局部变量是通过命令push或通过更改寄存器ESP值在堆栈中分配的。 Then they are initiated with command mov or something like that. 然后用命令mov或类似的东西启动它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM