简体   繁体   English

全局变量声明如何解决C中的堆栈溢出?

[英]How does the global variable declaration solve the stack overflow in C?

I have some C code. 我有一些C代码。 What it does is simple, get some array from io, then sort it. 它做的很简单,从io获取一些数组,然后对其进行排序。

#include <stdio.h>
#include <stdlib.h>

#define ARRAY_MAX 2000000

int main(void) {
    int my_array[ARRAY_MAX];
    int w[ARRAY_MAX];
    int count = 0;

    while (count < ARRAY_MAX && 1 == scanf("%d", &my_array[count])) {
        count++;
    }

    merge_sort(my_array, w, count);
    return EXIT_SUCCESS;
}

And it works well, but if I really give it a group of number which is 2000000, it cause a stack overflow. 它运行良好,但如果我真的给它一组2000000的数字,它会导致堆栈溢出。 Yes, it used up all the stack. 是的,它耗尽了所有堆栈。 One of the solution is to use malloc() to allocate a memory space for these 2 variables, to move them to the heap, so no problem at all. 其中一个解决方案是使用malloc()为这两个变量分配一个内存空间,将它们移动到堆中,这样就没问题了。

The other solution is to move the below 2 declaration to the global scope, to make them global variables. 另一个解决方案是将以下2声明移动到全局范围,以使它们成为全局变量。

    int my_array[ARRAY_MAX];
    int w[ARRAY_MAX];

My tutor told me that this solution does the same job: to move these 2 variables into the heap. 我的导师告诉我,这个解决方案做了同样的工作:将这两个变量移动到堆中。

But I checked some documents online. 但我在网上查了一些文件。 Global variables, without initialisation, they will reside in the bss segment, right? 全局变量,没有初始化,它们将驻留在bss段中,对吧?

I checked online, the size of this section is just few bytes. 我在网上查了一下,这部分的大小只有几个字节。 How could it prevent the stack overflow? 怎么能阻止堆栈溢出?

Or, because these 2 types are array, so they are pointers, and global pointers reside in data segment, and it indicates the size of data segment can be dynamically changed as well? 或者,因为这两种类型是数组,所以它们是指针,全局指针驻留在数据段中,它表示数据段的大小也可以动态改变?

The bss (block started by symbol) section is tiny in the object file (4 or 8 bytes) but the value stored is the number of bytes of zeroed memory to allocate after the initialized data. bss(由符号开始的块)部分在目标文件中很小(4或8个字节),但存储的值是在初始化数据之后要分配的归零内存的字节数。

It avoids the stack overflow by allocating the storage 'not on the stack'. 它通过分配存储“不在堆栈上”来避免堆栈溢出。 It is normally in the data segment, after the text segment and before the start of the heap segment — but that simple memory picture can be more complicated these days. 它通常位于数据段中,在文本段之后和堆段开始之前 - 但是这些简单的内存映像现在可能更复杂。

Officially, there should be caveats about 'the standard doesn't say that there must be a stack' and various other minor bits'n'pieces, but that doesn't alter the substance of the answer. 正式地说,应该有一些警告,“标准并没有说必须有堆叠”和其他各种小部分'''',但这并没有改变答案的实质。 The bss section is small because it is a single number — but the number can represent an awful lot of memory. bss部分很小,因为它是一个数字 - 但数字可能代表了大量的内存。

Disclaimer: This is not a guide, it is an overview. 免责声明:这不是指南,它是一个概述。 It is based on how Linux does things, though I may have gotten some details wrong. 它基于Linux如何做事,尽管我可能已经弄错了一些细节。 Most (desktop) operating systems use a very similar model, with different details. 大多数(桌面)操作系统使用非常相似的模型,具有不同的细节。 Additionally, this only applies to userspace programs. 此外,这仅适用于用户空间程序。 Which is what you're writing unless you're developing for the kernel or working on modules (linux), drivers (windows), kernel extensions (osx). 除非您正在开发内核或处理模块(linux),驱动程序(Windows),内核扩展(osx),否则这就是您要编写的内容。

Virtual Memory: I'll go into more detail below, but the gist is that each process gets an exclusive 32-/64-bit address space. 虚拟内存:我将在下面详细介绍,但要点是每个进程都获得一个独有的32/64位地址空间。 And obviously a process' entire address space does not always map to real memory. 显然,进程的整个地址空间并不总是映射到真实的内存。 This means A) one process' addresses mean nothing to another process and B) the OS decides which parts of a process' address space are loaded into real memory and which parts can stay on disk, at any given point in time. 这意味着A)一个进程'地址对另一个进程没有任何意义; B)操作系统决定进程的地址空间的哪些部分被加载到实际内存中,以及哪些部分可以在任何给定的时间点保留在磁盘上。

Executable File Format 可执行文件格式

Executable files have a number of different sections. 可执行文件有许多不同的部分。 The ones we care about here are .text , .data , .bss , and .rodata . 我们关心的是.text.data.bss.rodata The .text section is your code. .text部分是您的代码。 The .data and .bss sections are global variables. .data.bss部分是全局变量。 The .rodata section is constant-value 'variables' (aka consts). .rodata部分是常量值'变量'(又名consts)。 Consts are things like error strings and other messages, or perhaps magic numbers. Consts是错误字符串和其他消息,或者可能是幻数。 Values that your program needs to refer to but never change. 程序需要引用但永远不会更改的值。 The .data section stores global variables that have an initial value. .data部分存储具有初始值的全局变量。 This includes variables defined as <type> <varname> = <value>; 这包括定义为<type> <varname> = <value>;变量<type> <varname> = <value>; . Eg a data structure containing state variables, with initial values, that your program uses to keep track of itself. 例如,包含状态变量的数据结构,具有初始值,程序用它来跟踪自身。 The .bss section records global variables that do not have an initial value, or that have an initial value of zero. .bss部分记录没有初始值或初始值为零的全局变量。 This includes variables defined as <type> <varname>; 这包括定义为<type> <varname>;变量<type> <varname>; and <type> <varname> = 0; <type> <varname> = 0; . Since the compiler and the OS both know that variables in the .bss section should be initialized to zero, there's no reason to actually store all of those zeros. 由于编译器和操作系统都知道.bss部分中的变量应该初始化为零,因此没有理由实际存储所有这些零。 So the executable file only stores variable metadata, including the amount of memory that should be allocated for the variable. 因此,可执行文件仅存储变量元数据,包括应为变量分配的内存量。

Process Memory Layout 进程内存布局

When the OS loads your executable, it creates six memory segments. 当操作系统加载可执行文件时,它会创建六个内存段。 The bss, data, and text segments are all located together. bss,数据和文本段都位于一起。 The data and text segments are loaded (not really, see virtual memory) from the file. 从文件中加载数据和文本段(实际上不是虚拟内存)。 The bss section is allocated to the size of all of your uninitialized/zero-initialized variables (see VM). bss部分分配给所有未初始化/零初始化变量的大小(请参阅VM)。 The memory mapping segment is similar to the data and text segments in that it consists of blocks of memory that are loaded (see VM) from files. 内存映射段类似于数据和文本段,因为它由从文件加载(参见VM)的内存块组成。 This is where dynamic libraries are loaded. 这是加载动态库的地方。

The bss, data, and text segments are fixed-size. bss,数据和文本段是固定大小的。 The memory mapping segment is effectively fixed-size, but it will grow when your program loads a new dynamic library or uses another memory mapping function. 内存映射段实际上是固定大小的,但是当程序加载新的动态库或使用另一个内存映射函数时,它会增长。 However, this does not happen often and the size increase is always the size of the library or file (or shared memory) being mapped. 但是,这不会经常发生,并且大小增加始终是要映射的库或文件(或共享内存)的大小。

The Stack 堆栈

The stack is a bit more complicated. 堆栈有点复杂。 A zone of memory, the size of which is determined by the program, is reserved for the stack. 内存区域,其大小由程序确定,为堆栈保留。 The top of the stack (low memory address) is initialized with the main function's variables. 使用main函数的变量初始化堆栈的顶部(低内存地址)。 During execution, more variables may be added to or removed from the bottom of the stack. 在执行期间,可以向堆栈的底部添加或移除更多变量。 Pushing data onto the stack 'grows' it down (higher memory address), increasing stack pointer (which maintains the address of the bottom of the stack). 将数据推入堆栈会“增长”它(更高的内存地址),增加堆栈指针(保持堆栈底部的地址)。 Popping data off the stack shrinks it up, reducing the stack pointer. 从堆栈中弹出数据会将其缩小,从而减少堆栈指针。 When a function is called, the address of the next instruction in the calling function (the return address, within the text segment) is pushed onto the stack. 调用函数时,调用函数中的下一条指令的地址(文本段内的返回地址)被压入堆栈。 When a function returns, it restores the stack to the state it was in before the function was called (everything it pushed onto the stack is popped off) and jumps to the return address. 当函数返回时,它会将堆栈恢复到调用函数之前的状态(推送到堆栈的所有内容都会弹出)并跳转到返回地址。

If the stack grows too large, the result is dependent on many factors. 如果堆栈变得太大,结果取决于许多因素。 Sometimes you get a stack overflow. 有时你得到堆栈溢出。 Sometimes the run-time (in your case, the C runtime) tries to allocate more memory for the stack. 有时,运行时(在您的情况下,C运行时)尝试为堆栈分配更多内存。 This topic is beyond the scope of this answer. 本主题超出了本答案的范围。

The Heap

The heap is used for dynamic memory allocation. 堆用于动态内存分配。 Memory allocated with one of the alloc functions lives on the heap. 分配有一个alloc函数的内存存在于堆上。 All other memory allocations are not on the heap. 所有其他内存分配都不在堆上。 The heap starts as a large block of unused memory. 堆启动为大块未使用的内存。 When you allocate memory on the heap, the OS tries to find space within the heap for your allocation. 在堆上分配内存时,操作系统会尝试在堆中查找用于分配的空间。 I'm not going to go over how the actual allocation process works. 我不打算讨论实际的分配过程是如何工作的。

Virtual Memory 虚拟内存

The OS makes your process think that it has the entire 32-/64-bit memory space to play in. Obviously, this is impossible; 操作系统使您的进程认为它具有整个32/64位内存空间。显然,这是不可能的; often this would mean your process had access to more memory than your computer physically has; 通常这意味着您的进程可以访问比计算机实际拥有的内存更多的内存; on a 32-bit processor with 4GB of memory, this would mean your process had access to every bit of memory, with no room left for other processes. 在具有4GB内存的32位处理器上,这意味着您的进程可以访问每一位内存,而没有剩余空间用于其他进程。

The addresses that your process uses are fake. 您的进程使用的地址是假的。 They do not map to actual memory. 它们不映射到实际内存。 Additionally, most of the memory in your process' address space is inaccessible, because it refers to nothing (on a 32-bit processor it may not be most). 此外,进程的地址空间中的大部分内存都是不可访问的,因为它没有任何内容(在32位处理器上它可能不是最多)。 The ranges of usable/valid addresses are partitioned into pages. 可用/有效地址的范围被划分为页面。 The kernel maintains a page table for each process. 内核为每个进程维护一个页表。

When your executable is loaded and when your process loads a file, in reality, it is mapped to one or more pages. 加载可执行文件时以及进程加载文件时,实际上它会映射到一个或多个页面。 The OS does not necessarily actually load that file into memory. 操作系统不一定实际将该文件加载到内存中。 What it does is create enough entries in the page table to cover the entire file while notating that those pages are backed by a file. 它的作用是在页表中创建足够的条目以覆盖整个文件,同时注意这些页面由文件支持。 Entries in the page table have two flags and an address. 页表中的条目有两个标志和一个地址。 The first flag (valid/invalid) indicates whether or not the page is loaded in real memory. 第一个标志(有效/无效)表示页面是否加载到实际内存中。 If the page is not loaded, the other flag and the address are meaningless. 如果未加载页面,则另一个标志和地址无意义。 If the page is loaded, the second flag indicates whether or not the page's real memory has been modified since it was loaded and the address maps the page to real memory. 如果页面被加载,则第二个标志指示页面的实际内存是否已被加载,并且该地址将页面映射到实际内存。

The stack, heap, and bss work similarly, except they are not backed by a 'real' file. 堆栈,堆和bss的工作方式类似,但它们不受“真实”文件的支持。 If the OS decides that one of your process' pages isn't being used, it will unload that page. 如果操作系统确定您的某个进程页面未被使用,则会卸载该页面。 Before it unloads the page, if the modified flag is set in the page table for that page, it will save the page to disk somewhere. 在卸载页面之前,如果在该页面的页面表中设置了修改标志,它将把页面保存到磁盘的某个地方。 This means that if a page in the stack or heap is unloaded, a 'file' will be created that now maps to that page. 这意味着如果卸载堆栈或堆中的页面,将创建一个现在映射到该页面的“文件”。

When your process tries to access a (virtual) memory address, the kernel/memory management hardware uses the page table to translate that virtual address to a real memory address. 当您的进程尝试访问(虚拟)内存地址时,内核/内存管理硬件使用页表将该虚拟地址转换为实际内存地址。 If the valid/invalid flag is invalid, a page fault is triggered. 如果有效/无效标志无效,则触发页面错误。 The kernel pauses your process, locates or makes a free page, loads part of the mapped file (or fake file for the stack or heap) into that page, sets the valid/invalid flag to valid, updates the address, then reruns the original instruction that triggered the page fault. 内核暂停您的进程,定位或创建一个空闲页面,将映射文件的一部分(或堆栈或堆的伪文件)加载到该页面,将有效/无效标志设置为有效,更新地址,然后重新运行原始触发页面错误的指令。

AFAIK, the bss section is a special page or pages. AFAIK,bss部分是一个或多个特殊页面。 When a page in this section is first accessed (and triggers a page fault), the page is zeroed before the kernel returns control to your process. 首次访问此部分中的页面(并触发页面错误)时,在内核将控制权返回给您的进程之前,页面将归零。 This means that the kernel doesn't pre-zero the entire bss section when your process is loaded. 这意味着在加载进程时内核不会将整个bss部分预先置零。

Further Reading 进一步阅读

Global variables are not allocated on the stack. 全局变量未在堆栈上分配。 They are allocated in the data segment (if initialised) or the bss (if they are uninitialised). 它们分配在数据段(如果已初始化)或bss(如果它们未初始化)中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM