简体   繁体   中英

How can I reproduce zombie process with bash as PID1 in docker?

I have a Docker container that runs bash at PID1 which in turn runs a long-running (complex) service that sometimes produces zombie processes parented to the bash at PID1. These zombies are seemingly never reaped.

I'm trying to reproduce this issue in a minimal container so that I can test mitigations, such as using a proper init as PID1 rather than bash.

However, I have been unable to reproduce the zombie processes. The bash at PID1 seems to reap children, even those it inherited from another process.

Here is what I tried:

docker run -d ubuntu:14.04 bash -c \
  'bash -c "start-stop-daemon --background --start --pidfile /tmp/sleep.pid --exec /bin/sleep -- 30; sleep 300"'

My expectation was that start-stop-daemon would double-fork to create a process parented to the bash at PID1, then exec into sleep 30 , and when the sleep exits I expected the process to remain as a zombie. The sleep 300 simulates a long-running service.

However, bash reaps the process, and I can observe that by running strace on the bash process (from the host machine running docker):

$ sudo strace -p 2051
strace: Process 2051 attached
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9
wait4(-1,

I am running docker 1.11.1-rc1, though I have the same experience with docker 1.9.

$ docker --version
Docker version 1.11.1-rc1, build c90c70c
$ uname -r
4.4.8-boot2docker

Given that strace shows bash reaping (orphaned) children, is bash a suitable PID1 in a docker container? What else might be causing the zombies I'm seeing in the more complex container? How can I reproduce?

Edit:

I managed to attach strace to a bash PID1 on one of the live containers exhibiting the problem.

Process 20381 attached
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11185
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11191
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11203
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11155
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11151
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11152
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11154
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11332
...

Not sure exactly what all those exiting processes are, but none of the PIDs match those of the few defunct zombie processes that were shown by docker exec $id ps aux | grep defunct docker exec $id ps aux | grep defunct .

Maybe the trick is to catch it in action and see what wait4() returns on a process that remains a zombie...

I also wanted to verify if my jenkins container slaves can generate zombies or not.

Since my images run the scl binary which in turn starts the java JLNP client, I performed the following in jenkins slave groovy script console:

def process=new ProcessBuilder("bash", '-c', 'sleep 10 </dev/null &>/dev/null & disown').redirectErrorStream(true).start()
println process.inputStream.text
println " ps -ef".execute().text

Zombies have been generated. That is with scl ending up as PID 1.

Then I looked at your question and decided to try out bash. My first attempt was changing ENTRYPOINT to this:

bash -c "/usr/bin/scl enable rh-ror42 -- /usr/local/bin/run-jnlp-client $1 $2" --

Then looking at ps output I realized that PID 1 was not bash but in fact PID 1 was still the scl binary. Finally changed command to:

bash -c "/usr/bin/scl enable rh-ror42 -- /usr/local/bin/run-jnlp-client $1 $2 ; ls" --

That is adding some random second command after the scl command. And voila - bash became PID 1 and no zombies generate anymore.

Looking at your example, I see that you run bash -c with more than one command. So in your test bed, you are running something like my last command. But in your work containers, it is likely that you run bash -c with only one command and it appears bash became clever enough to effectively do an exec . And probably in your work containers that generate zombies, bash is not actually PID 1 contrary to what you expect.

Perhaps you can ps -ef inside your existing work containers and verify if my guess is correct.

I hit the same problem while attempting to create a zombie process inside a container with bash as PID 1. Turns out (as you can see from the wait4() calls), that bash actually is waiting on all children in a tight loop ( man wait explains that waiting on -1 will return when any child exits).

This means when an orphan is reparented to bash , bash will correctly wait on it to prevent it from remaining a zombie. Very strange that all literature on the internet says otherwise.

To test if your applications are leaving zombies, you will need to assure that bash is not PID 1, rather it's the first child of PID 1.

On another question How to reap zombie process in docker container with bash I had shown example how to create a container with bash that will ignore the zombie processes by becoming PID 1 and executing bash as a child. Here is c code that can be used to generate the container:

#include <stdlib.h>

int main() {
    int status;
    status  = system("/bin/bash");
}

The code that generates the zombie and the dockerfile for the container can be found in the github repository

After compiling the module in an image, all you need to do is to start the container with docker run -ti --rm image /zombie/ignore and you will get the bash as a first child. To see this working in practice, check the link to the other question.

root@1bd66ac87f0a:/zombie# ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:17 pts/0    00:00:00 /zombie/ignore
root           7       1  0 11:17 pts/0    00:00:00 sh -c /bin/bash

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM