简体   繁体   中英

How the virtual address of the process memory is calculated?

I'm write the following program to examine process memory layout:

#include <stdio.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/time.h>
#include <unistd.h>

#define CHAR_LEN 255

char filepath[CHAR_LEN];
char line[CHAR_LEN];
char address[CHAR_LEN];
char perms[CHAR_LEN];
char offset[CHAR_LEN];
char dev[CHAR_LEN];
char inode[CHAR_LEN];
char pathname[CHAR_LEN];

int main() {
  printf("Hello world.\n");

  sprintf(filepath, "/proc/%u/maps", (unsigned)getpid());
  FILE *f = fopen(filepath, "r");

  printf("%-32s %-8s %-10s %-8s %-10s %s\n", "address", "perms", "offset",
         "dev", "inode", "pathname");
  while (fgets(line, sizeof(line), f) != NULL) {
    sscanf(line, "%s%s%s%s%s%s", address, perms, offset, dev, inode, pathname);
    printf("%-32s %-8s %-10s %-8s %-10s %s\n", address, perms, offset, dev,
           inode, pathname);
  }

  fclose(f);
  return 0;
}

I compile the program as gcc -static -O0 -g -std=gnu11 -o test_helloworld_memory_map test_helloworld_memory_map.c -lpthread . I first run readelf -l test_helloworld_memory_map and obtain:

Elf file type is EXEC (Executable file)
Entry point 0x400890
There are 6 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000c9e2e 0x00000000000c9e2e  R E    200000
  LOAD           0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
                 0x0000000000001c98 0x0000000000003db0  RW     200000
  NOTE           0x0000000000000190 0x0000000000400190 0x0000000000400190
                 0x0000000000000044 0x0000000000000044  R      4
  TLS            0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
                 0x0000000000000020 0x0000000000000050  R      8
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
                 0x0000000000000148 0x0000000000000148  R      1

 Section to Segment mapping:
  Segment Sections...
   00     .note.ABI-tag .note.gnu.build-id .rela.plt .init .plt .text __libc_freeres_fn __libc_thread_freeres_fn .fini .rodata __libc_subfreeres __libc_atexit .stapsdt.base __libc_thread_subfreeres .eh_frame .gcc_except_table
   01     .tdata .init_array .fini_array .jcr .data.rel.ro .got .got.plt .data .bss __libc_freeres_ptrs
   02     .note.ABI-tag .note.gnu.build-id
   03     .tdata .tbss
   04
   05     .tdata .init_array .fini_array .jcr .data.rel.ro .got

Then, I run the program and obtain:

address                          perms    offset     dev      inode      pathname
00400000-004ca000                r-xp     00000000   fd:01    12551992   /home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map
006c9000-006cc000                rw-p     000c9000   fd:01    12551992   /home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map
006cc000-006ce000                rw-p     00000000   00:00    0          /home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map
018ac000-018cf000                rw-p     00000000   00:00    0          [heap]
7ffc2845c000-7ffc2847d000        rw-p     00000000   00:00    0          [stack]
7ffc28561000-7ffc28563000        r--p     00000000   00:00    0          [vvar]
7ffc28563000-7ffc28565000        r-xp     00000000   00:00    0          [vdso]
ffffffffff600000-ffffffffff601000 r-xp     00000000   00:00    0          [vsyscall]

I'm confused about why the virtual address of memory segment is different from one shown in "/proc/[pid]/maps". For example, the virtual address of the 2nd memory segment is 0xc9eb8 shown by readelf but in the process memory, it is calculated to 0x6c9000 . How's this calculation is done?

I know the linker specifies 0x400000 as the starting address of the first memory segment and process memory shows address aligned to the page size (4K) (eg, 0xc9e2e is aligned to 0xca000 plus 0x400000 ). I think this has something to do with "Align" column shown by readelf . However, reading ELF header makes me confuse:

  p_align This member holds the value to which the segments are aligned in memory and in the file. Loadable process seg‐ ments must have congruent values for p_vaddr and p_offset, modulo the page size. Values of zero and one mean no alignment is required. Otherwise, p_align should be a pos‐ itive, integral power of two, and p_vaddr should equal p_offset, modulo p_align. 

In specific, what does the last sentence means?:

Otherwise, p_align should be a positive, integral power of two, and p_vaddr should equal p_offset, modulo p_align.

What's the calculation formula it is talking about?

Thanks much!

CPU address mapping has a "page" granularity, 4K is still a very common page size. /proc/$pid/maps shows you the OS mappings, it doesn't show you what addresses the process actually cares about inside the mapped ranges. Your process only cares about what starts at offset eb8 into the first mapped page, but the CPU (and hence the OS that's controlling it for you) can't be bothered to map down to byte granularity, and the linker knows it, so it sets up the disk file with cpu-page-sized blocks.

It means that for other than loadable segments, ie those without LOAD , the last n bits in the offset must match the last n in virtual address; and the value of the p_align field is the 1 << n .

For example, the stack says it can be placed anywhere, just that the address needs to be 16-aligned.

For loadable they need to be at least page-aligned. Take the second one from your example:

               Offset             VirtAddr

LOAD           0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
               0x0000000000001c98 0x0000000000003db0  RW     200000

Given page size of 4096, the last 12 bits of the offset must be the same as the last 12 bits of the virtual address . This is because a dynamic linker usually uses mmap to map the pages directly from the file into memory, and this can be only page-granular. So in fact the dynamic linker did map the first part of this range from the file.

006c9000-006cc000                rw-p     000c9000   fd:01    12551992    
 /home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map

Further see that the file size is less than virtual size - the rest of the data will be zero mapped in the other mapping:

006cc000-006ce000                rw-p     00000000   00:00    0                  
 /home/zeyuanhu/share/380L-Spring19/lab3/src/test_helloworld_memory_map

If you read the bytes at 0x00000000006c9000 - 0x00000000006c9eb7 you should see the exact same bytes as those at 0x00000000004c9000 - 0x00000000006c9eb7 , this is because the data segment and code segment come right after each other in the file without padding - this saves lots of disk space and actually helps saving ram too because the executable takes less space in the block device caches!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM