为什么不解决分叉过程中的变化？

Question

I'm trying to understand fork() and process address spaces.我正在尝试了解fork()和进程地址空间。 I wrote a basic proof of concept program that forks a new process and changes a variable in the new process.我编写了一个基本的概念验证程序，该程序分叉一个新进程并更改新进程中的变量。 My expectation was that when I change a variable in the child, this should cause that variable to get a new address.我的期望是，当我更改孩子中的变量时，这应该会导致该变量获得新地址。 If I understand correctly, Linux does copy-on-write with fork.如果我理解正确，Linux 会使用 fork 进行写时复制。 So I would expect the variable address in the parent and child to match until I change it in one of them.所以我希望父级和子级中的变量地址匹配，直到我在其中之一中更改它。 Then I would expect them to be different.那么我希望他们会有所不同。 However, that's not what I'm seeing.然而，这不是我所看到的。

Is this because with copy-on-write a new page is allocated from physical memory, but the process address space is unchanged - just remapped to the new page by the TLB?这是因为写时复制从物理内存中分配了一个新页面，但进程地址空间没有改变 - 只是由 TLB 重新映射到新页面？ Or am I not understanding this or made a dump mistake in my program?还是我不理解这一点或在我的程序中犯了转储错误？

Proof of concept code:概念验证代码：

#include <iostream>
#include <string>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

void describe(const std::string &descr, const int &data) {
    pid_t ppid = getppid();
    pid_t pid = getpid();

    std::cout << "In " << descr << ":\n"
              << "Parent Process ID:  " << ppid
              << "\nMy Process ID:  " << pid
              << "\nValue of data:  " << data
              << "\nAddress of data:  " << &data << "\n\n";
}

void change(int &data) {
    // Should cause data to get new page frame:
    data *= 2;
}

int main () {
    int data = 42;
    int status;

    pid_t pid = fork();

    switch(pid) {
        case -1:
            std::cerr << "Error:  Failed to successfully fork a process.\n";
            exit(1);
            break;
        case 0:
            // In forked child
            describe("Child", data);
            // Lazy way to wait for parent to run describe:
            usleep(1'000);
            break;
        default:
            // In calling parent
            describe("Parent", data);
            // Lazy way to wait for child to run describe:
            usleep(1'000);
    }

    if (pid == 0) {
        std::cout << "Only change data in child...\n";
        change(data);
        describe("Child", data);
    } else {
        // Lazy way to wait for child to change data:
        usleep(1'000);
        describe("Parent", data);
    }

    // Wait for child:
    if (pid != 0) {
        wait(&status);
    }

    return 0;
}

Example run:示例运行：

ubuntuvm:~$ ./example
In Parent:
Parent Process ID:  265569
My Process ID:  316986
Value of data:  42
Address of data:  0x7fffb63878d4

In Child:
Parent Process ID:  316986
My Process ID:  316987
Value of data:  42
Address of data:  0x7fffb63878d4

Only change data in child...
In Child:
Parent Process ID:  316986
My Process ID:  316987
Value of data:  84
Address of data:  0x7fffb63878d4

In Parent:
Parent Process ID:  265569
My Process ID:  316986
Value of data:  42
Address of data:  0x7fffb63878d4

Answer 1

My expectation was that when I change a variable in the child, this should cause that variable to get a new address.我的期望是，当我更改孩子中的变量时，这应该会导致该变量获得新地址。

No, because they are virtual addresses.不，因为它们是虚拟地址。

If I understand correctly, Linux does copy-on-write with fork.如果我理解正确，Linux 会使用 fork 进行写时复制。 So I would expect the variable address in the parent and child to match until I change it in one of them.所以我希望父级和子级中的变量地址匹配，直到我在其中之一中更改它。

A new physical page will be used somewhere, but the virtual address can (and will) stay the same.将在某处使用新的物理页面，但虚拟地址可以（并且将）保持不变。

Is this because with copy-on-write a new page is allocated from physical memory, but the process address space is unchanged - just remapped to the new page by the TLB?这是因为写时复制从物理内存中分配了一个新页面，但进程地址空间没有改变 - 只是由 TLB 重新映射到新页面？

Of course.当然。 Otherwise it would be way less useful.否则用处不大。 If it worked as you say, then consider any pointer you had previous to the fork would become invalid suddenly.如果它像您说的那样工作，那么请考虑您在分叉之前拥有的任何指针都会突然无效。 Think about code as simple as:把代码想象成这样简单：

int * p = new int;

if (!fork()) {
    // the child

    *p = 42;

    // now `p` is invalid since we wrote to it?!

    // another read or write would segfault!
    *p = 43;
}

In a way, it would be like having a live program on one of those games where the platforms (pages for us) fall down when you step on them once.从某种意义上说，这就像在其中一个游戏上有一个直播节目，当您踩到它们时，平台（我们的页面）就会掉下来。 Quite fun!很有趣！ :) :)

We could examine fixing the problem by having the operating system or the CPU rewrite (somehow) your pointers with the new address when that happens to keep everything working.我们可以通过让操作系统或 CPU 用新地址重写（以某种方式）您的指针来检查解决问题，当碰巧保持一切正常时。

However, even if that were possible, we have more issues.然而，即使这是可能的，我们也有更多的问题。 For instance, you need to take care of allocations that cover several pages.例如，您需要处理覆盖多个页面的分配。 Imagine the stack (assuming Linux does CoW for the stack too on fork() ).想象一下堆栈（假设 Linux 在fork()上也为堆栈执行 CoW）。 As soon as you wrote anything to the stack you would have to update the stack pointer and copy all the pages, not just the modified one.一旦您向堆栈写入任何内容，您就必须更新堆栈指针并复制所有页面，而不仅仅是修改后的页面。

Then we have to solve indirect pointers and pointers in data structures that do not point to allocations, etc. It seems impossible to solve without tracking which registers and pointers need to be updated for each possible future write (or having some different implementation for C pointers overall as @R mentions -- same for registers etc.).然后我们必须解决数据结构中不指向分配等的间接指针和指针等问题。如果不跟踪每个可能的未来写入需要更新哪些寄存器和指针（或对 C 指针有一些不同的实现），似乎不可能解决总体而言，正如@R 提到的一样——寄存器等也是如此）。

为什么不解决分叉过程中的变化？

问题描述

1 个解决方案

解决方案1
16 已采纳 2020-09-10 14:09:28

为什么不解决分叉过程中的变化？

问题描述

1 个解决方案

解决方案1 16 已采纳 2020-09-10 14:09:28

解决方案1
16 已采纳 2020-09-10 14:09:28