How does fork() process mark parent's PTE's as read only?

Question

I've searched through a lot of resources, but found nothing concrete on the matter:

I know that with some linux systems, a fork() syscall works with copy-on-write; that is, the parent and the child share the same address space, but PTE is now marked read-only , to be used later of COW. when either tries to access a page, a PAGE_FAULT occur and the page is copied to another place, where it can be modified.

However, I cannot understand how the OS reaches the shared PTEs to mark them as "read". I have hypothesized that when a fork() syscall occurs, the OS preforms a "page walk" on the parent's page table and marks them as read-only - but I find no confirmation for this, or any information regarding the process.

Does anyone know how the pages come to be marked as read only? Will appreciate any help. Thanks!

Answer 1

Linux OS implements syscall fork with iterating over all memory ranges ( mmap s, stack and heap) of parent process. Copying of that ranges (VMA - Virtual memory areas is in functioncopy_page_range (mn/memory.c) which has loop over page table entries:

copy_page_range williterate over pgd and call
copy_pud_range to iterate over pud and call
copy_pmd_range to iterate over pmd and call
copy_pte_range to iterate over pte and call
copy_one_pte which does memory usage accounting (RSS) and has several code segments to handle COW case:

    /*
     * If it's a COW mapping, write protect it both
     * in the parent and the child
     */
    if (is_cow_mapping(vm_flags)) {
        ptep_set_wrprotect(src_mm, addr, src_pte);
        pte = pte_wrprotect(pte);
    }

where is_cow_mapping will be true for private and potentially writable pages (bitfield flags is checked for shared and maywrite bits and should have only maywrite bit set)

#define VM_SHARED   0x00000008
#define VM_MAYWRITE 0x00000020

static inline bool is_cow_mapping(vm_flags_t flags)
{
    return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
}

PUD, PMD, and PTE are described in books like https://www.kernel.org/doc/gorman/html/understand/understand006.html and in articles like LWN 2005: "Four-level page tables merged" .

How fork implementation calls copy_page_range :

fork syscall implementation ( sys_fork? or syscall_define0(fork) ) is do_fork (kernel/fork.c) which will call
copy_process which will call many copy_* functions , including
copy_mm which calls
dup_mm to allocate and fill new mm struct, where most work is done by
dup_mmap (still kernel/fork.c) which will check what was mmaped and how. (Here I was unable to get exact path to COW implementation so I used the Internet Search Machine with something like "fork+COW+dup_mm" to get hints like [1] or [2] or [3] ). After checking mmap types there is retval = copy_page_range(mm, oldmm, mpnt); line to do real work.

How does fork() process mark parent's PTE's as read only?

Question

1 answers

solution1
6 ACCPTED 2020-02-24 00:27:10

How does fork() process mark parent's PTE's as read only?

Question

1 answers

solution1 6 ACCPTED 2020-02-24 00:27:10

solution1
6 ACCPTED 2020-02-24 00:27:10