简体繁体 English

在Unix中调用fork（）时会发生什么？

[英]What Happens When I Call fork() in Unix?

原文 2011-09-17 13:38:10 3 4 c/ unix/ fork/ process

I've tried to look this up, but I'm struggling a bit to understand the relation between the Parent Process and the Child Process immediately after I call fork(). 我试着看一下，但是在调用fork（）之后，我正在努力理解父进程和子进程之间的关系。

Are they completely separate processes, only associated by the id/parent id? 它们是完全独立的进程，只与id / parent id相关联吗？ Or do they share memory? 或者他们共享记忆？ For example the 'code' section of each process - is that duplicated so that each process has it's own identical copy, or is that 'shared' in some way so that only one exists? 例如，每个进程的“代码”部分是重复的，以便每个进程都有自己的相同副本，或者是以某种方式'共享'以便只存在一个？

I hope that makes sense. 我希望这是有道理的。

In the name of full disclosure this is 'homework related'; 以完全披露的名义，这是“与家庭作业相关”; while not a direct question from the book, I have a feeling it's mostly academic and, in practice, I probably don't need to know. 虽然不是书中的直接问题，但我感觉它主要是学术性的，在实践中，我可能不需要知道。

4 个解决方案

As it appears to the process, the entire memory is duplicated. 在整个过程中，整个内存都是重复的。

In reality, it uses "copy on write" system. 实际上，它使用“写入时复制”系统。 The first time either process changes its memory after fork(), a separate copy is made of the modified page (usually 4kB). 第一次进程在fork（）之后更改其内存时，会对已修改的页面（通常为4kB）进行单独的复制。

Usually the code segment of a process is not modified, in which case it remains shared. 通常，流程的代码段不会被修改，在这种情况下，它仍然是共享的。

Logically, a fork creates an identical copy of the original process that is largely independent of the original. 从逻辑上讲，fork会创建原始进程的相同副本，该副本在很大程度上独立于原始进程。 For performance reasons, memory is shared with copy-on-write semantics, which means that unmodified memory (such as code) remains shared. 出于性能原因，内存与copy-on-write语义共享，这意味着未修改的内存（如代码）仍然是共享的。

File descriptors are duplicated, so that the forked process could, in principle, take over a database connection on behalf of the parent (or they could even jointly communicate with the database if the programmer is a bit twisted). 文件描述符是重复的，因此分叉进程原则上可以代表父进程接管数据库连接（或者如果程序员有点扭曲，它们甚至可以与数据库联合通信）。 More commonly, this is used to set up pipes between processes so you can write find -name '*.c' | xargs grep fork 更常见的是，这用于在进程之间设置管道，因此您可以编写find -name '*.c' | xargs grep fork find -name '*.c' | xargs grep fork . find -name '*.c' | xargs grep fork 。

A bunch of other stuff is shared. 一堆其他的东西是共享的。 See here for details. 详情请见此处。

One important omission is threads — the child process only inherits the thread that called fork() . 一个重要的遗漏是线程 - 子进程只继承调用fork()的线程。 This causes no end of trouble in multithreaded programs, since the status of mutexes, etc., that were locked in the parent is implementation-specific (and don't forget that malloc() and printf() use locks internally). 这导致多线程程序中没有问题，因为锁定在父级中的互斥锁等的状态是特定于实现的（并且不要忘记malloc()和printf()内部使用锁）。 The only safe thing to do in the child after fork() returns is to call execve() as soon as possible, and even then you have to be cautious with file descriptors. fork()返回后，子execve()唯一安全的做法就是尽快调用execve() ，即使这样你也必须对文件描述符保持谨慎。 See here for the full horror story. 在这里看到完整的恐怖故事。

They are separate processes ie the Child and the Parent will have separate PIDs 它们是单独的进程，即Child和Parent将具有单独的PID
The child will inherit all of the open descriptors from the Parent 子将继承Parent中的所有开放描述符
Internally the pages ie the stack/heap regions which can be modified unlike the .text region, will be shared b/w the Parent and the Child until one of them tries to modify the contents. 在内部，页面即可以与.text区域不同地修改的堆栈/堆区域将与父项和子项共享，直到其中一个尝试修改内容为止。 In such cases a new page is created and data specific to the page being modified is copied to this freshly allocated page and mapped to the region corresponding to the one who caused the change - could be either the Parent or Child. 在这种情况下，将创建一个新页面，并将特定于正在修改的页面的数据复制到这个新分配的页面，并映射到与导致更改的页面相对应的区域 - 可以是父级或子级。 This is called COW (mentioned by other members in this forum above in their answers). 这被称为COW（上面的论坛中其他成员在他们的答案中提到）。
The Child can finish execution and until reclaimed by the parent using the wait() or waitpid() calls will be in ZOMBIE state. Child可以完成执行，直到父级使用wait（）或waitpid（）调用回收，将处于ZOMBIE状态。 This will help clear the child's process entry from the process table and allow the child pid to be reused. 这将有助于从进程表中清除子进程的进程，并允许重用子进程。 Usually when a child dies, the SIGCHLD signal is sent out to the parent which would ideally result in a handler being called subsequent to which the wait() call is executed in that handler. 通常当一个子节点死亡时，SIGCHLD信号被发送到父节点，这将理想地导致在该处理程序中执行wait（）调用之后调用处理程序。
In case the Parent exits without cleaning up the already running or zombie child (via the wait() waitpid calls), the init() process (PID 1) becomes the parent to these now orphan children. 如果Parent退出但没有清理已经运行或僵尸的子进程（通过wait（）waitpid调用），init（）进程（PID 1）将成为这些现在孤儿的父进程。 This init() process executes wait() or waitpid() calls at regular intervals. 此init（）进程定期执行wait（）或waitpid（）调用。

EDIT: typos HTH 编辑：错别字HTH

Yes, they are separate processes, but with some special "properties". 是的，它们是独立的过程，但有一些特殊的“属性”。 One of them is the child-parent relation. 其中之一是孩子与父母的关系。

But more important is the sharing of memory pages in a copy-on-write (COW) manner: until the one of them performs a write (to a global variable or whatever) on a page, the memory pages are shared. 但更重要的是以写时复制（COW）方式共享内存页：直到其中一个执行页面上的写入（全局变量或其他），内存页面被共享。 When a write is performed, a copy of that page is created by the kernel and mapped at the right address. 执行写入时，内核会创建该页面的副本并映射到正确的地址。

The COW magic is done by in the kernel by marking the pages as read-only and using the fault mechanism. 通过在内核中将页面标记为只读并使用故障机制来完成COW魔术。