How does the global variable declaration solve the stack overflow in C?

Question

I have some C code. What it does is simple, get some array from io, then sort it.

#include <stdio.h>
#include <stdlib.h>

#define ARRAY_MAX 2000000

int main(void) {
    int my_array[ARRAY_MAX];
    int w[ARRAY_MAX];
    int count = 0;

    while (count < ARRAY_MAX && 1 == scanf("%d", &my_array[count])) {
        count++;
    }

    merge_sort(my_array, w, count);
    return EXIT_SUCCESS;
}

And it works well, but if I really give it a group of number which is 2000000, it cause a stack overflow. Yes, it used up all the stack. One of the solution is to use malloc() to allocate a memory space for these 2 variables, to move them to the heap, so no problem at all.

The other solution is to move the below 2 declaration to the global scope, to make them global variables.

    int my_array[ARRAY_MAX];
    int w[ARRAY_MAX];

My tutor told me that this solution does the same job: to move these 2 variables into the heap.

But I checked some documents online. Global variables, without initialisation, they will reside in the bss segment, right?

I checked online, the size of this section is just few bytes. How could it prevent the stack overflow?

Or, because these 2 types are array, so they are pointers, and global pointers reside in data segment, and it indicates the size of data segment can be dynamically changed as well?

Answer 1

The bss (block started by symbol) section is tiny in the object file (4 or 8 bytes) but the value stored is the number of bytes of zeroed memory to allocate after the initialized data.

It avoids the stack overflow by allocating the storage 'not on the stack'. It is normally in the data segment, after the text segment and before the start of the heap segment — but that simple memory picture can be more complicated these days.

Officially, there should be caveats about 'the standard doesn't say that there must be a stack' and various other minor bits'n'pieces, but that doesn't alter the substance of the answer. The bss section is small because it is a single number — but the number can represent an awful lot of memory.

Answer 2

Disclaimer: This is not a guide, it is an overview. It is based on how Linux does things, though I may have gotten some details wrong. Most (desktop) operating systems use a very similar model, with different details. Additionally, this only applies to userspace programs. Which is what you're writing unless you're developing for the kernel or working on modules (linux), drivers (windows), kernel extensions (osx).

Virtual Memory: I'll go into more detail below, but the gist is that each process gets an exclusive 32-/64-bit address space. And obviously a process' entire address space does not always map to real memory. This means A) one process' addresses mean nothing to another process and B) the OS decides which parts of a process' address space are loaded into real memory and which parts can stay on disk, at any given point in time.

Executable File Format

Executable files have a number of different sections. The ones we care about here are .text , .data , .bss , and .rodata . The .text section is your code. The .data and .bss sections are global variables. The .rodata section is constant-value 'variables' (aka consts). Consts are things like error strings and other messages, or perhaps magic numbers. Values that your program needs to refer to but never change. The .data section stores global variables that have an initial value. This includes variables defined as <type> <varname> = <value>; . Eg a data structure containing state variables, with initial values, that your program uses to keep track of itself. The .bss section records global variables that do not have an initial value, or that have an initial value of zero. This includes variables defined as <type> <varname>; and <type> <varname> = 0; . Since the compiler and the OS both know that variables in the .bss section should be initialized to zero, there's no reason to actually store all of those zeros. So the executable file only stores variable metadata, including the amount of memory that should be allocated for the variable.

Process Memory Layout

When the OS loads your executable, it creates six memory segments. The bss, data, and text segments are all located together. The data and text segments are loaded (not really, see virtual memory) from the file. The bss section is allocated to the size of all of your uninitialized/zero-initialized variables (see VM). The memory mapping segment is similar to the data and text segments in that it consists of blocks of memory that are loaded (see VM) from files. This is where dynamic libraries are loaded.

The bss, data, and text segments are fixed-size. The memory mapping segment is effectively fixed-size, but it will grow when your program loads a new dynamic library or uses another memory mapping function. However, this does not happen often and the size increase is always the size of the library or file (or shared memory) being mapped.

The Stack

The stack is a bit more complicated. A zone of memory, the size of which is determined by the program, is reserved for the stack. The top of the stack (low memory address) is initialized with the main function's variables. During execution, more variables may be added to or removed from the bottom of the stack. Pushing data onto the stack 'grows' it down (higher memory address), increasing stack pointer (which maintains the address of the bottom of the stack). Popping data off the stack shrinks it up, reducing the stack pointer. When a function is called, the address of the next instruction in the calling function (the return address, within the text segment) is pushed onto the stack. When a function returns, it restores the stack to the state it was in before the function was called (everything it pushed onto the stack is popped off) and jumps to the return address.

If the stack grows too large, the result is dependent on many factors. Sometimes you get a stack overflow. Sometimes the run-time (in your case, the C runtime) tries to allocate more memory for the stack. This topic is beyond the scope of this answer.

The Heap

The heap is used for dynamic memory allocation. Memory allocated with one of the alloc functions lives on the heap. All other memory allocations are not on the heap. The heap starts as a large block of unused memory. When you allocate memory on the heap, the OS tries to find space within the heap for your allocation. I'm not going to go over how the actual allocation process works.

Virtual Memory

The OS makes your process think that it has the entire 32-/64-bit memory space to play in. Obviously, this is impossible; often this would mean your process had access to more memory than your computer physically has; on a 32-bit processor with 4GB of memory, this would mean your process had access to every bit of memory, with no room left for other processes.

The addresses that your process uses are fake. They do not map to actual memory. Additionally, most of the memory in your process' address space is inaccessible, because it refers to nothing (on a 32-bit processor it may not be most). The ranges of usable/valid addresses are partitioned into pages. The kernel maintains a page table for each process.

When your executable is loaded and when your process loads a file, in reality, it is mapped to one or more pages. The OS does not necessarily actually load that file into memory. What it does is create enough entries in the page table to cover the entire file while notating that those pages are backed by a file. Entries in the page table have two flags and an address. The first flag (valid/invalid) indicates whether or not the page is loaded in real memory. If the page is not loaded, the other flag and the address are meaningless. If the page is loaded, the second flag indicates whether or not the page's real memory has been modified since it was loaded and the address maps the page to real memory.

The stack, heap, and bss work similarly, except they are not backed by a 'real' file. If the OS decides that one of your process' pages isn't being used, it will unload that page. Before it unloads the page, if the modified flag is set in the page table for that page, it will save the page to disk somewhere. This means that if a page in the stack or heap is unloaded, a 'file' will be created that now maps to that page.

When your process tries to access a (virtual) memory address, the kernel/memory management hardware uses the page table to translate that virtual address to a real memory address. If the valid/invalid flag is invalid, a page fault is triggered. The kernel pauses your process, locates or makes a free page, loads part of the mapped file (or fake file for the stack or heap) into that page, sets the valid/invalid flag to valid, updates the address, then reruns the original instruction that triggered the page fault.

AFAIK, the bss section is a special page or pages. When a page in this section is first accessed (and triggers a page fault), the page is zeroed before the kernel returns control to your process. This means that the kernel doesn't pre-zero the entire bss section when your process is loaded.

How does the global variable declaration solve the stack overflow in C?

Question

3 answers

solution1
2 ACCPTED 2016-08-12 04:20:58

solution2
2 2016-08-12 18:27:51

Executable File Format

Process Memory Layout

The Stack

The Heap

Virtual Memory

Further Reading

solution3
1 2016-08-12 04:21:42

How does the global variable declaration solve the stack overflow in C?

Question

3 answers

solution1 2 ACCPTED 2016-08-12 04:20:58

solution2 2 2016-08-12 18:27:51

Executable File Format

Process Memory Layout

The Stack

The Heap

Virtual Memory

Further Reading

solution3 1 2016-08-12 04:21:42

solution1
2 ACCPTED 2016-08-12 04:20:58

solution2
2 2016-08-12 18:27:51

solution3
1 2016-08-12 04:21:42