简体   繁体   中英

Why does fork() fail on MacOs Big Sur if the executable that runs it is deleted?

If a running process's executable is deleted, I've noticed fork fails where the child process is never executed.

For example, consider the code below:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/wait.h>

int main(void) {
  sleep(5);
  pid_t forkResult;
  forkResult = fork();

  printf("after fork %d \n", forkResult);
  return 0;
}

If I compile this and delete the resulting executable before fork is called, I never see fork return a pid of 0, meaning the child process never starts. I only have a Mac running Big Sur, so not sure if this repros on other OS's.

Does anyone know why this would be? My understanding is an executable should work just fine even if it's deleted while still running.

The expectation that the process should continue even if the binary was deleted is correct, however not fully correct in case of macOS . The example is tripping on a side-effect of the System Integrity Protection ( SIP ) mechanism inside the macOS kernel, however before explaining what is exactly going on, we need to make several experiments which will help us to better understand the whole scenario.

Modified example to better demonstrate the issue

To demonstrate what is going on, I had modified the example to count to 9, than do the fork, after the fork, the child will print a message "I am done", wait 1 second and exit by printing the 0 as the PID. The parent will continue to count to 14 and print the child PID. The code is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/wait.h>

int main(void) {
  for(int i=0; i <10; i++)
  {
     sleep(1);
     printf("%i ", i);
  }
  pid_t forkResult;
  forkResult = fork();
  if (forkResult != 0) {
     for(int i=10; i < 15; i++) {
        sleep(1);
        printf("%i ", i);
     }
  } else {
     sleep(1);
     printf("I am done ");
  }

  printf("after fork %d \n", forkResult);
  return 0;
}

After compiling it, I have started the normal scenario:

╰> ./a.out
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 4385

So, the normal scenario works as expected. The fact that we see the count from 0 to 9 two times, is due to the copy of the buffers for stdout that was done in the fork call.

Tracing the failing example

Now is time to do the negative scenario, we will wait for 5 seconds after the start and remove the binary.

╰> ./a.out & (sleep 5 && rm a.out)
[4] 8555
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 8677
[4]    8555 done       ./a.out

We see that the output is only from the parent. Since the parent had counted to 14, and shows valid PID for the child, however the child is missing, it never printed anything. So, the child creation failed after the fork() was performed, otherwise fork() would have received and error instead of a valid PID . Traces from ktrace reveal that the child was created under the pid and was waken up:

test5-ko.txt:2021-04-07 13:34:26.623783 +04          0.3              MACH_DISPATCH                        1bc              0                84               4                888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04          0.2              TMR_TimerCallEnter                   9931ba49ead1bd17 0                330e7e4e9a59     41               888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04          0.0(0.0)         TMR_TimerCallEnter                   9931ba49ead1bd17 0                330e7e4e9a59     0                888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04          0.0              TMR_TimerCallEnter                   9931ba49ead1bd17 0                330e7e4e9a59     0                888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623854 +04          0.0              imp_thread_qos_and_relprio           88775d           20000            20200            6                888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623854 +04          0.0              imp_update_thread                    88775d           811200           140000100        1f               888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04          0.1(0.8)         imp_update_thread                    88775d           c15200           140000100        25               888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04          0.0(1.1)         imp_thread_qos_and_relprio           88775d           30000            20200            40               888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04          0.0              imp_thread_qos_workq_override        88775d           30000            20200            0                888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04          0.0              imp_update_thread                    88775d           c15200           140000100        25               888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04          0.1(0.1)         imp_update_thread                    88775d           c15200           140000100        25               888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04          0.0(0.2)         imp_thread_qos_workq_override        88775d           30000            20200            40               888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623857 +04          1.3              TURNSTILE_turnstile_added_to_thread_heap 88775d           9931ba6049ddcc77 0                0                888065           2   a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623858 +04          1.0              MACH_MKRUNNABLE                      88775d           25               0                5                888065           2   a.out(8677)
t

So the child's process was dispatched with MACH_DISPATCH and made runnable with MACH_MKRUNNABLE . This is the reason the parent got valid PID after the fork() .

Further more the ktrace for the normal scenario shows that the process had issued BSC_exit and and imp_task_terminated system call occurred, which is the normal way for a process to exit. However, in the second scenario where we had deleted the file, the trace doesn't show BSC_exit . This means that the child was terminated by the kernel, not by a normal termination. And we know that the termination happend after the child was created properly, since the parent had received the valid PID and the PID was made runnable.

This bring us closer to the understanding of what is going on here. But, before we have the conclusion, let's show another even more "twisted" example.

Even more strange example

What if we replace the binary on the filesystem after we started the process?

Here is the test to answer this question: we will start the process, remove the binary and create an empty file with the same name on his place with touch .

╰> ./a.out & (sleep 5 && rm a.out; touch a.out)
[1] 6264
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 6851
[1]  + 6722 done       ./a.out

Wait a minute, this works?? What is going on here??!?

This strange example gives us important clue that will help us to explain what is going on.

The root-cause of the issue

The reason why the third example works, while the second one is failing, reveals a lot of what is going on here. As mentioned on the beginning, we are tripping on a side-effect of SIP , more precisely on the runtime protection mechanism.

To protect the system integrity, SIP will examine the running processes for the system protection and special entitlement . From the apple documentation: ...When a process is started, the kernel checks to see whether the main executable is protected on disk or is signed with an special system entitlement. If either is true, then a flag is set to denote that it is protected against modification. Any attempt to attach to a protected process is denied by the kernel...

When we had removed the binary from the filesystem, the protection mechanism was not able to identify the type of process for the child nor the special system entitlements since the binary file was missing from the disk. This triggered the protection mechanism to treat this process as an intruder in the system and terminate it, hanse we had not seen the BSC_exit for the child process.

In the third example, when we created dummy entry on the file system with touch , the SIP was able to detect that this is not a special process nor it has special entitlements and allowed the process to continue. This is a very solid indication that we ware tripping on the SIP realtime protection mechanism.

To prove that this is the case, I have disabled the SIP which requires a restart in the recovery mode and executed the test

╰> csrutil status
System Integrity Protection status: disabled.
╰> ./a.out & (sleep 5 && rm a.out)
[1] 1504
0 1 2 3 4 5 6 7 8 9 I am done after fork 0 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 1626

Conclusion

So, the whole issue was caused by the System Integrity Protection . More details can be fond in the documentation

All the SIP needed was to have a file on the filesystem with the process name, so the mechanism can run the verification and decide to allow the child to continue the execution. This is showing us that we are observing a side-effect, rather than designed behavior, since the empty file was not even a valid dwarf , yet the execution had proceed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM