简体   繁体   English

是否可以在不继承父进程的虚拟内存空间的情况下派生一个进程?

[英]Is it possible to fork a process without inherit virtual memory space of parent process?

As the parent process is using huge mount of memory, fork may fail with errno of ENOMEM under some configuration of kernel overcommit policy. 由于父进程正在使用大量内存,因此在内核过量使用策略的某些配置下, fork可能会因ENOMEM errno而失败。 Even though the child process may only exec low memory-consuming program like ls. 即使子进程只能exec像ls这样的低内存消耗程序。

To clarify the problem, when /proc/sys/vm/overcommit_memory is configured to be 2, allocation of (virtual) memory is limited to SWAP + MEMORY * ration(default to 50%) . 为了解决该问题,将/ proc / sys / vm / overcommit_memory配置为2时,(虚拟)内存的分配限制为SWAP + MEMORY * ration(default to 50%) When a process forks, virtual memory is not copied thanks to COW. 进程分叉时,由于使用了COW,因此不会复制虚拟内存。 But the kernel still need to allocate virtual memory space. 但是内核仍然需要分配虚拟内存空间。 As an analogy, fork is like malloc(virtual memory space size) which will not allocate physical memory and writing to shared memory will cause copy of virtual memory and physical memory is allocated. 打个比方,fork就像malloc(虚拟内存空间大小)一样,它不会分配物理内存,而写入共享内存将导致虚拟内存的副本和物理内存被分配。 When overcommit_memory is configured to be 2, fork may fail due to virtual memory space allocation. 当overcommit_memory配置为2时,fork可能由于虚拟内存空间分配而失败。

Is it possible to fork a process without inherit virtual memory space of parent process in the following conditions? 在以下情况下是否可以fork不继承父进程的虚拟内存空间的进程?

  1. if the child process calls exec after fork 如果子进程在fork之后调用exec

  2. if the child process doesn't call exec and will not using any global or static variable from parent process. 如果子进程不调用exec并且不会使用父进程中的任何全局或静态变量。 For example, the child process just do some logging then quit. 例如,子进程只是做一些日志记录然后退出。

No, it is not possible. 不,不可能。 You might be interested by vfork(2) which I don't recommend. 您可能对vfork(2)感兴趣,但我不建议这样做。 Look also into mmap(2) and its MAP_NORESERVE flag. 还要查看mmap(2)及其MAP_NORESERVE标志。 But copy-on-write techniques are used by the kernel, so you practically won't double the RAM consumption. 但是内核使用写时复制技术,因此您几乎不会将RAM消耗翻倍。

My suggestion is to have enough swap space to not being concerned by such an issue. 我的建议是有足够的交换空间以免被此类问题困扰。 So setup your computer to have more available swap space than the largest running process. 因此,将计算机设置为具有比最大的运行进程更多的可用交换空间。 You can always create some temporary swap file (eg with dd if=/dev/zero of=/var/tmp/swapfile bs=1M count=32768 then mkswap /var/tmp/swapfile ) then add it as a temporary swap zone ( swapon /var/tmp/swapfile ) and remove it ( swapoff /var/tmp/swapfile and rm /var/tmp/swapfile ) when you don't need it anymore. 您总是可以创建一些临时交换文件 (例如,使用dd if=/dev/zero of=/var/tmp/swapfile bs=1M count=32768然后mkswap /var/tmp/swapfile ),然后将其添加为临时交换区域( swapon /var/tmp/swapfile ),并在不再需要它时将其删除( swapoff /var/tmp/swapfilerm /var/tmp/swapfile )。

You probably don't want to swap on a tmpfs file system like /tmp/ often is, since tmpfs file systems are backed up by swap space!. 您可能不希望像/tmp/这样的tmpfs文件系统进行交换,因为tmpfs文件系统由交换空间备份!

I dislike memory overcommitment and I disable it (thru proc(5) ). 我不喜欢内存过量使用并禁用它(通过proc(5) )。 YMMV. YMMV。

I'm not aware of any way to do (2), but for (1) you could try to use vfork which will fork a new process without copying the page tables of the parent process. 我不知道有任何方法可以执行(2),但是对于(1),您可以尝试使用vfork ,它将在不复制父进程的页表的情况下派生一个新进程。 But this generally isn't recommended for a number of reasons, including because it causes the parent to block until the child performs an execve or terminates. 但是,这通常不建议有很多原因,其中包括因为它会导致父 ,直到孩子执行execve或终止。

As Basile Starynkevitch answered , it's not possible. 正如Basile Starynkevitch 回答的那样 ,这是不可能的。

There is, however, a very simple and common solution used for this, that does not rely on Linux-specific behaviour or memory overcommit control: Use an early-forked slave process do the fork and exec. 但是,有一个非常简单且通用的解决方案,它不依赖于Linux特定的行为或内存过量使用控制:使用早期分叉的从属进程执行fork和exec。

Have the large parent process create an unix domain socket and fork a slave process as early as possible, closing all other descriptors in the slave (reopening STDIN_FILENO , STDOUT_FILENO , and STDERR_FILENO to /dev/null ). 让大型父进程创建一个unix域套接字,并尽早派生一个从属进程,关闭从属服务器中的所有其他描述符(将STDIN_FILENOSTDOUT_FILENOSTDERR_FILENO重新打开到/dev/null )。 I prefer a datagram socket for its simplicity and guarantees, although a stream socket will also work. 尽管流套接字也可以工作,但我更喜欢数据报套接字的简单性和保证。

In some rare cases it is useful to have the slave process execute a separate dedicated small helper program. 在极少数情况下,使从属进程执行单独的专用小型帮助程序很有用。 In most instances this is not necessary, and makes security design much easier. 在大多数情况下,这不是必需的,这会使安全性设计变得更加容易。 (In Linux, you can include SCM_CREDENTIALS ancillary messages when passing data using an Unix domain socket, and use the process ID therein to verify the identity/executable the peer is using the /proc/PID/exe pseudo-file.) (在Linux中,当使用Unix域套接字传递数据时,可以包括SCM_CREDENTIALS辅助消息,并在其中使用进程ID来验证对等方使用/proc/PID/exe伪文件的身份/可执行文件。)

In any case, the slave process will block in reading from the socket. 无论如何,从属进程将阻止从套接字读取。 When the other end closes the socket, the read/receive will return 0, and the slave process will exit. 当另一端关闭套接字时,读/接收将返回0,并且从属进程将退出。

Each datagram the slave process receives, describes a command to execute. 从属进程接收的每个数据报都描述了要执行的命令。 (Using a datagram allows using C strings, delimited with NUL characters, without any escaping etc.; using an Unix stream socket typically requires you to delimit the "command" somehow, which in turn means escaping the delimiters in the command component strings.) (使用数据报可以使用以NUL字符定界的C字符串,而没有任何转义之类;使用Unix流套接字通常需要您以某种方式定界“命令”,这反过来又意味着在命令组件字符串中定界定界符。)

The slave process creates one or more pipes, and forks a child process. 从属进程创建一个或多个管道,并派生一个子进程。 This child process closes the original Unix socket, replaces the standard streams with the respective pipe ends (closing the other ends), and executes the desired command. 该子进程关闭原始的Unix套接字,用相应的管道端替换标准流(关闭其他端),并执行所需的命令。 I personally prefer to use an extra close-on-exec socket in Linux to detect successful execution; 我个人更喜欢在Linux中使用额外的close-on-exec套接字来检测成功执行。 in an error case, the errno code is written to the socket, so that the slave-parent can reliably detect the failure and the exact reason, too. 在错误情况下,将errno代码写入套接字,以便从属父级也可以可靠地检测到故障和确切原因。 If success, the slave-parent closes the unnecessary pipe ends, replies to the original process about the success, with the other pipe ends as SCM_RIGHTS ancillary data. 如果成功,则从属父级将关闭不必要的管道末端,并回复有关成功的原始过程,而其他管道末端将作为SCM_RIGHTS辅助数据。 After sending the message, it closes the rest of the pipe ends, and waits for a new message. 发送消息后,它将关闭管道的其余部分,并等待新消息。

On the original process side, the above process is sequential; 在原始过程方面,上述过程是顺序的; only one thread may execute start executing an external process at a time. 一次只能执行一个线程以开始执行外部进程。 (You simply serialize the access with a mutex.) Several can run at the same time; (您只需使用互斥锁序列化访问权限即可。)可以同时运行多个访问权限。 it is only the request to and response from the slave helper that is serialized. 序列化的只是对从属助手的请求和响应。

If that is an issue -- it should not be in typical cases -- you can for example multiplex the connections, by prefixing each message with an ID number (assigned by the parent process, monotonically increasing). 如果这是一个问题(在典型情况下不应该这样),则可以例如通过为每个消息添加一个ID号(由父进程分配,单调递增)来对连接进行多路复用。 In that case, you'll probably use a dedicated thread on the parent end to manage the communications with the slave, as you certainly cannot have multiple threads reading from the same socket at the same time, and expect deterministic results. 在这种情况下,您可能会在父端使用专用线程来管理与从属设备的通信,因为您当然不能同时从同一套接字读取多个线程,并且无法获得确定的结果。

Further improvements to the scheme include things like using a dedicated process group for the executed processes, setting limits to them (by setting limits to the slave process), and executing the commands as dedicated users and groups by using a privileged slave. 该方案的进一步改进包括诸如为执行的进程使用专用进程组,设置对它们的限制(通过设置对从属进程的限制)以及通过使用特权从属以专用用户和组的身份执行命令。

The privileged slave case is where it is most useful to have the parent execute a separate helper process for it. 在特权的从属情况下,让父级为其执行单独的帮助程序进程最为有用。 In Linux, both sides can use SCM_CREDENTIALS ancillary messages via Unix domain sockets to verify the identity (PID, and with ID, the executable) of the peer, making it rather straightforward to implement robust security. 在Linux中,双方都可以通过Unix域套接字使用SCM_CREDENTIALS辅助消息来验证对等方的身份(PID,并带有ID,可执行文件),从而可以SCM_CREDENTIALS实现强大的安全性。 (But note that /proc/PID/exe has to be checked more than once, to catch the attacks where a message is sent by a nefarious program, quickly executing the appropriate program but with command-line arguments that cause it to exit soon, making it occasionally look like the correct executable made the request, while a copy of the descriptor -- and thus the entire communications channel -- was in control of a nefariuous user.) (但请注意,必须多次检查/proc/PID/exe ,以捕获由恶意程序发送消息的攻击,并迅速执行适当的程序,但带有导致其立即退出的命令行参数,使其有时看起来像是正确的可执行文件发出了请求,而描述符的副本(进而是整个通信通道)则由一个无关紧要的用户控制。)

In summary, the original problem can be solved, although the answer to the posed question is No. If the executions are security-sensitive, for example change privileges (user accounts) or capabilities (in Linux), then the design has to be carefully considered, but in normal cases the implementation is quite straight-forward. 总而言之,尽管提出的问题的答案为“否”,但仍然可以解决原始问题。如果执行对安全性敏感,例如更改特权(用户帐户)或功能(在Linux中),则必须谨慎设计考虑,但在正常情况下,实现是很简单的。

I'd be happy to elaborate if necessary. 如有必要,我很乐意详细说明。

This is possible on Linux. 这在Linux上是可能的。 Use the clone syscall without the flag CLONE_THREAD and with the flag CLONE_VM . 使用不带标志CLONE_THREAD和带标志CLONE_VMclone syscall。 The parent and child processes will use the same mappings, much like a thread would; 父进程和子进程将使用相同的映射,就像线程一样。 there is no COW or page table copying. 没有COW或页表复制。

madvise(addr, size, MADV_DONTFORK)

另外,您可以在fork()之后调用munmap()来删除从父进程继承的虚拟地址。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM