简体   繁体   中英

C++ how are variables accessed in memory?

When I create a new variable in a C++ program, eg a char:

char c = 'a';

how does C++ then have access to this variable in memory? I would imagine that it would need to store the memory location of the variable, but then that would require a pointer variable, and this pointer would again need to be accessed.

See the docs :

When a variable is declared, the memory needed to store its value is assigned a specific location in memory (its memory address). Generally, C++ programs do not actively decide the exact memory addresses where its variables are stored. Fortunately, that task is left to the environment where the program is run - generally, an operating system that decides the particular memory locations on runtime. However, it may be useful for a program to be able to obtain the address of a variable during runtime in order to access data cells that are at a certain position relative to it.

You can also refer this article on Variables and Memory

The Stack

The stack is where local variables and function parameters reside. It is called a stack because it follows the last-in, first-out principle. As data is added or pushed to the stack, it grows, and when data is removed or popped it shrinks. In reality, memory addresses are not physically moved around every time data is pushed or popped from the stack, instead the stack pointer, which as the name implies points to the memory address at the top of the stack, moves up and down. Everything below this address is considered to be on the stack and usable, whereas everything above it is off the stack, and invalid. This is all accomplished automatically by the operating system, and as a result it is sometimes also called automatic memory. On the extremely rare occasions that one needs to be able to explicitly invoke this type of memory, the C++ key word auto can be used. Normally, one declares variables on the stack like this:

 void func () { int i; float x[100]; ... } 

Variables that are declared on the stack are only valid within the scope of their declaration. That means when the function func() listed above returns, i and x will no longer be accessible or valid.

There is another limitation to variables that are placed on the stack: the operating system only allocates a certain amount of space to the stack. As each part of a program that is being executed comes into scope, the operating system allocates the appropriate amount of memory that is required to hold all the local variables on the stack. If this is greater than the amount of memory that the OS has allowed for the total size of the stack, then the program will crash. While the maximum size of the stack can sometimes be changed by compile time parameters, it is usually fairly small, and nowhere near the total amount of RAM available on a machine.

C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.

The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.

C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.

Assuming this is a local variable, then this variable is allocated on the stack - ie in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.

Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. Eg it would look like this (simple addition):

a = b + c

take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3

The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.

Lets say our program starts with a stack address of 4000000

When, you call a function, depending how much stack you use, it will "allocate it" like this

Let's say we have 2 ints (8bytes)

int function()
{
    int a = 0;
    int b = 0;
}

then whats gonna happen in assembly is

MOV EBP,ESP //Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000

SUB ESP, 8 //here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8

so our ESP address was changed from 4000000 to 3999992

that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000

Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like

char my_array[20000]; since all it's doing is just doing sub esp, 20000 which is a single assembly instruction

but if u actually use all those bytes like memset(my_array,20000) that's a different history.

how does C++ then have access to this variable in memory?

It doesn't!

Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language . I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM