简体   繁体   中英

Duration and scope of a forked process in large scale Unix C applications

We're dealing with C code on a Unix system at school, and we want to fork processes to split the application into several processes. Looking for conceptual help with the nitty gritty of how forks work.

For example, I understand that when you fork(), a new [child] process is created with an identical memory space to the parent process (shared memory aside). In this context, what is the "parent process" though? How much of the code base is packed into a given process before the OS breaks the application apart into smaller processes? Or am I thinking of "processes" all wrong?

Example, if you have a 100,000 line program that has a fork() call somewhere on the 70,000th line, is the entire program and memory that has been built up over the runtime of the application duplicated even if only a small bit of the parent's data is necessary for the child's success? Is a large program like this split up into smaller processes to mitigate the duplication load of a fork() call? If so, where are the splits? If not, how can you optimize around this? Should applications not be this large to begin with?

A similar question, when a child process is created that is identical to the parent, when can it be assumed to be terminated? Using the above example of a 100,000 line program, if there is a fork at line 100, will the forked child run the entirety of the rest of the program even if that is not needed? How can you avoid this? Are there design factors that our class isn't delving into? We are living in an academic codebase where all our programs are 50 lines or less but I'm trying to learn big-picture concepts for my edification.

Thanks!

You are completely right with your understanding of fork() - it is supposed to do an exact copy of the calling process, effectively doubling memory requirements of the program.

In practice (when wisely used), most of this overhead is optimized away with the help of most modern OS's virtual memory implementation: first, the OS will only consider writable segments for copy (text segments - your code - isn't considered writeable on most operating segments), so both processes will share the same code segments. With the remaining data segments the OS will do copy on write, so data segments will only be duplicated when the child first attempts to write them.

Copies of the data segments will be done on a page-by-page basis, so with a "not so smart mix" of parent and child data scattered across pages you will still be able to force duplicating most of the parents data unused to the child's memory space. With the classic fork() - exec() pair most of the data will be effectively be separated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM