Why doesn't address change in forked process?

Question

I'm trying to understand fork() and process address spaces. I wrote a basic proof of concept program that forks a new process and changes a variable in the new process. My expectation was that when I change a variable in the child, this should cause that variable to get a new address. If I understand correctly, Linux does copy-on-write with fork. So I would expect the variable address in the parent and child to match until I change it in one of them. Then I would expect them to be different. However, that's not what I'm seeing.

Is this because with copy-on-write a new page is allocated from physical memory, but the process address space is unchanged - just remapped to the new page by the TLB? Or am I not understanding this or made a dump mistake in my program?

Proof of concept code:

#include <iostream>
#include <string>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

void describe(const std::string &descr, const int &data) {
    pid_t ppid = getppid();
    pid_t pid = getpid();

    std::cout << "In " << descr << ":\n"
              << "Parent Process ID:  " << ppid
              << "\nMy Process ID:  " << pid
              << "\nValue of data:  " << data
              << "\nAddress of data:  " << &data << "\n\n";
}

void change(int &data) {
    // Should cause data to get new page frame:
    data *= 2;
}

int main () {
    int data = 42;
    int status;

    pid_t pid = fork();

    switch(pid) {
        case -1:
            std::cerr << "Error:  Failed to successfully fork a process.\n";
            exit(1);
            break;
        case 0:
            // In forked child
            describe("Child", data);
            // Lazy way to wait for parent to run describe:
            usleep(1'000);
            break;
        default:
            // In calling parent
            describe("Parent", data);
            // Lazy way to wait for child to run describe:
            usleep(1'000);
    }

    if (pid == 0) {
        std::cout << "Only change data in child...\n";
        change(data);
        describe("Child", data);
    } else {
        // Lazy way to wait for child to change data:
        usleep(1'000);
        describe("Parent", data);
    }

    // Wait for child:
    if (pid != 0) {
        wait(&status);
    }

    return 0;
}

Example run:

ubuntuvm:~$ ./example
In Parent:
Parent Process ID:  265569
My Process ID:  316986
Value of data:  42
Address of data:  0x7fffb63878d4

In Child:
Parent Process ID:  316986
My Process ID:  316987
Value of data:  42
Address of data:  0x7fffb63878d4

Only change data in child...
In Child:
Parent Process ID:  316986
My Process ID:  316987
Value of data:  84
Address of data:  0x7fffb63878d4

In Parent:
Parent Process ID:  265569
My Process ID:  316986
Value of data:  42
Address of data:  0x7fffb63878d4

Answer 1

My expectation was that when I change a variable in the child, this should cause that variable to get a new address.

No, because they are virtual addresses.

If I understand correctly, Linux does copy-on-write with fork. So I would expect the variable address in the parent and child to match until I change it in one of them.

A new physical page will be used somewhere, but the virtual address can (and will) stay the same.

Is this because with copy-on-write a new page is allocated from physical memory, but the process address space is unchanged - just remapped to the new page by the TLB?

Of course. Otherwise it would be way less useful. If it worked as you say, then consider any pointer you had previous to the fork would become invalid suddenly. Think about code as simple as:

int * p = new int;

if (!fork()) {
    // the child

    *p = 42;

    // now `p` is invalid since we wrote to it?!

    // another read or write would segfault!
    *p = 43;
}

In a way, it would be like having a live program on one of those games where the platforms (pages for us) fall down when you step on them once. Quite fun! :)

We could examine fixing the problem by having the operating system or the CPU rewrite (somehow) your pointers with the new address when that happens to keep everything working.

However, even if that were possible, we have more issues. For instance, you need to take care of allocations that cover several pages. Imagine the stack (assuming Linux does CoW for the stack too on fork() ). As soon as you wrote anything to the stack you would have to update the stack pointer and copy all the pages, not just the modified one.

Then we have to solve indirect pointers and pointers in data structures that do not point to allocations, etc. It seems impossible to solve without tracking which registers and pointers need to be updated for each possible future write (or having some different implementation for C pointers overall as @R mentions -- same for registers etc.).

Why doesn't address change in forked process?

Question

1 answers

solution1
16 ACCPTED 2020-09-10 14:09:28

Why doesn't address change in forked process?

Question

1 answers

solution1 16 ACCPTED 2020-09-10 14:09:28

solution1
16 ACCPTED 2020-09-10 14:09:28