简体   繁体   中英

Confusion of virtual memory

Consider a sample below.

char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';

The 4KB memory is allocated by calling malloc() . OS handles the memory request by the user program in user-space. First, OS requests memory allocation to RAM, then RAM gives physical memory address to OS. Once OS receives physical address, OS maps the physical address to virtual address then OS returns the virtual address which is the address of p to user program.

I wrote some value(a and b) in virtual address and they are really written into main memory(RAM). I'm confusing that I wrote some value in virtual address, not physical address, but it is really written to main memory(RAM) even though I didn't care about them.

What happens in behind? What OS does for me? I couldn't found relevant materials in some books(OS, system programming). Could you give some explanation? (Please omit the contents about cache for easier understanding)

You have to understand that virtual memory is virtual, and it can be more extensive than physical memory RAM, so it is mapped differently. Although they are actually the same.

Your programs use virtual memory addresses, and it is your OS who decides to save in RAM. If it fills up, then it will use some space on the hard drive to continue working.

But the hard drive is slower than the RAM, that's why your OS uses an algorithm, which could be Round-Robin, to exchange pages of memory between the hard drive and RAM, depending on the work being done, ensuring that the data that are most likely to be used are in fast memory. To swap pages back and forth, the OS does not need to modify virtual memory addresses.

Summary overlooking a lot of things

You want to understand how virtual memory works. There's lots of online resources about this, here's one I found that seems to do a fair job of trying to explain it without getting too crazy in technical details, but also doesn't gloss over important terms.

https://searchstorage.techtarget.com/definition/virtual-memory

For Linux on x86 platforms, the assembly equivalent of asking for memory is basically a call into the kernel using int 0x80 with some parameters for the call set into some registers. The interrupt is set at boot by the OS to be able to answer for the request. It is set in the IDT.

An IDT descriptor for 32 bits systems looks like:

struct IDTDescr {
   uint16_t offset_1; // offset bits 0..15
   uint16_t selector; // a code segment selector in GDT or LDT
   uint8_t zero;      // unused, set to 0
   uint8_t type_attr; // type and attributes, see below
   uint16_t offset_2; // offset bits 16..31
};

The offset is the address of the entry point of the handler for that interrupt. So interrupt 0x80 has an entry in the IDT. This entry points to an address for the handler(also called ISR). When you call malloc(), the compiler will compile this code to a system call. The system call returns in some register the address of the allocated memory. I'm pretty sure as well that this system call will actually use the sysenter x86 instruction to switch into kernel mode. This instruction is used alongside an MSR register to securely jump into kernel mode from user mode at the address specified in the MSR (Model Specific Register).

Once in kernel mode, all instructions can be executed and access to all hardware is unlocked. To provide with the request the OS doesn't "ask RAM for memory". RAM isn't aware of what memory the OS uses. RAM just blindly answers to asserted pins on it's DIMM and stores information. The OS just checks at boot using the ACPI tables that were built by the BIOS to determine how much RAM there is and what are the different devices that are connected to the computer to avoid writing to some MMIO (Memory Mapped IO). Once the OS knows how much RAM is available (and what parts are usable), it will use algorithms to determine what parts of available RAM every process should get.

When you compile C code, the compiler (and linker) will determine the address of everything right at compilation time. When you launch that executable the OS is aware of all memory the process will use. So it will set up the page tables for that process accordingly. When you ask for memory dynamically using malloc(), the OS determines what part of physical memory your process should get and changes (during runtime) the page tables accordingly.

As to paging itself, you can always read some articles. A short version is the 32 bits paging. In 32 bits paging you have a CR3 register for each CPU core. This register contains the physical address of the bottom of the Page Global Directory. The PGD contains the physical addresses of the bottom of several Page Tables which themselves contain the physical addresses of the bottom of several physical pages ( https://wiki.osdev.org/Paging ). A virtual address is split into 3 parts. The 12 bits to the right (LSB) are the offset in the physical page. The 10 bits in the middle are the offset in the page table and the 10 MSB are the offset in the PGD.

So when you write

char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';

you create a pointer of type char* and making a system call to ask for 4096 bytes of memory. The OS puts the first address of that chunk of memory into a certain conventional register (which depends on the system and OS). You should not forget that the C language is just a convention. It is up to the operating system to implement that convention by writing a compatible compiler. It means that the compiler knows what register and what interrupt number to use (for the system call) because it was specifically written for that OS. The compiler will thus take the address stored into this certain register and store it into this pointer of type char* during runtime. On the second line you are telling the compiler that you want to take the char at the first address and make it an 'a'. On the third line you make the second char a 'b'. In the end, you could write an equivalent:

char* p = (char*)malloc(4096);
*p = 'a';
*(p + 1) = 'b';

The p is a variable containing an address. The + operation on a pointer increments this address by the size of what is stored in that pointer. In this case, the pointer points to a char so the + operation increments the pointer by one char (one byte). If it was pointing to an int then it would be incremented of 4 bytes (32 bits). The size of the actual pointer depends on the system. If you have a 32 bits system then the pointer is 32 bits wide (because it contains an address). On a 64 bits system the pointer is 64 bits wide. A static memory equivalent of what you did is

char p[4096];
p[0] = 'a';
p[1] = 'b';

Now the compiler will know at compile time what memory this table will get. It is static memory. Even then, p represents a pointer to the first char of that array. It means you could write

char p[4096];
*p = 'a';
*(p + 1) = 'b';

It would have the same result.

First, OS requests memory allocation to RAM,…

The OS does not have to request memory. It has access to all of memory the moment it boots. It keeps its own database of which parts of that memory are in use for what purposes. When it wants to provide memory for a user process, it uses its own database to find some memory that is available (or does things to stop using memory for other purposes and then make it available). Once it chooses the memory to use, it updates its database to record that it is in use.

… then RAM gives physical memory address to OS.

RAM does not give addresses to the OS except that, when starting, the OS may have to interrogate the hardware to see what physical memory is available in the system.

Once OS receives physical address, OS maps the physical address to virtual address…

Virtual memory mapping is usually described as mapping virtual addresses to physical addresses. The OS has a database of the virtual memory addresses in the user process, and it has a database of physical memory. When it is fulfilling a request from the process to provide virtual memory and it decides to back that virtual memory with physical memory, the OS will inform the hardware of what mapping it choose. This depends on the hardware, but a typical method is that the OS updates some page table entries that describe what virtual addresses get translated to what physical addresses.

I wrote some value(a and b) in virtual address and they are really written into main memory(RAM).

When your process writes to virtual memory that is mapped to physical memory, the processor will take the virtual memory address, look up the mapping information in the page table entries or other database, and replace the virtual memory address with a physical memory address. Then it will write the data to that physical memory.

A detailed answer to your question will be very long - and too long to fit here at StackOverflow.

Here is a very simplified answer to a little part of your question.

You write:

I'm confusing that I wrote some value in virtual address, not physical address, but it is really written to main memory

Seems you have a very fundamental misunderstanding here.

There is no memory directly "behind" a virtual address. Whenever you access a virtual address in your program, it is automatically translated to a physical address and the physical address is then used for access in main memory.

The translation happens in HW, ie inside the processor in a block called "MMU - Memory management unit" (see https://en.wikipedia.org/wiki/Memory_management_unit ).

The MMU holds a small but very fast look-up table that tells how a virtual address is to be translated into a physical address. The OS configures this table but after that, the translation happens without any SW being involved and - just to repeat - it happens whenever you access a virtual memory address.

在此处输入图像描述

The MMU also takes some kind of process ID as input in order to do the translation. This is need because two different processes may use the same virtual address but they will need translation to two different physical addresses.

As mentioned above the MMU look-up table (TLB) is small so the MMU can't hold a all translations for a complete system. When the MMU can't do a translation, it can make an exception of some kind so that some OS software can be triggered. The OS will then re-program the MMU so that the missing translation gets into the MMU and the process execution can continue. Note: Some processors can do this in HW, ie without involving the OS.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM