I have a Python
script that starts a daemon
process. I was able to do this by using the code found at: https://gist.github.com/marazmiki/3618191 .
The code starts the daemon
process exactly as expected. However, sometimes , and only sometimes, when the daemon
process is stopped, the running job is zombied.
The stop
function of the code is:
def stop(self):
"""
Stop the daemon
"""
# Get the pid from the pidfile
try:
pf = file(self.pidfile, 'r')
pid = int(pf.read().strip())
pf.close()
except:
pid = None
if not pid:
message = "pidfile %s does not exist. Daemon not running?\n"
sys.stderr.write(message % self.pidfile)
return # not an error in a restart
# Try killing the daemon process
try:
while 1:
os.kill(pid, SIGTERM)
time.sleep(1.0)
except OSError, err:
err = str(err)
if err.find("No such process") > 0:
if os.path.exists(self.pidfile):
os.remove(self.pidfile)
else:
print str(err)
sys.exit(1)
When this stop()
method is run, the process ( pid
) appears to hang, and when I Control+C
out, I see the script is KeyboardInterrupted
on the line time.sleep(1.0)
, which leads me to believe that the line:
os.kill(pid, SIGTERM)
is the offending code.
Does anyone have any idea why this could be happening? Why would this os.kill()
would force a process to become a zombie?
I am running this on Ubuntu linux
(if it matters).
UPDATE : I'm including my start()
method per @paulus's answer.
def start(self):
"""
Start the daemon
"""
pid = None
# Check for a pidfile to see if the daemon already runs
try:
pf = file(self.pidfile, 'r')
pid = int(pf.read().strip())
pf.close()
except:
pid = None
if pid:
message = "pidfile %s already exist. Daemon already running?\n"
sys.stderr.write(message % self.pidfile)
sys.exit(1)
# Start the daemon
self.daemonize()
self.run()
UPDATE 2 : And here is the daemonize()
method:
def daemonize(self):
"""
do the UNIX double-fork magic, see Stevens' "Advanced
Programming in the UNIX Environment" for details (ISBN 0201563177)
http://www.erlenstar.demon.co.uk/unix/faq_2.html#SEC16
"""
try:
pid = os.fork()
if pid > 0:
# exit first parent
sys.exit(0)
except OSError, e:
sys.stderr.write("fork #1 failed: %d (%s)\n" % (e.errno, e.strerror))
sys.exit(1)
# decouple from parent environment
os.chdir("/")
os.setsid()
os.umask(0)
# do second fork
try:
pid = os.fork()
if pid > 0:
# exit from second parent
sys.exit(0)
except OSError, e:
sys.stderr.write("fork #2 failed: %d (%s)\n" % (e.errno, e.strerror))
sys.exit(1)
# redirect standard file descriptors
sys.stdout.flush()
sys.stderr.flush()
sys.stdout = file(self.stdout, 'a+', 0)
si = file(self.stdin, 'r')
so = file(self.stdout, 'a+')
se = file(self.stderr, 'a+', 0)
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())
# write pidfile
atexit.register(self.delpid)
pid = str(os.getpid())
file(self.pidfile, 'w+').write("%s\n" % pid)
You're looking in the wrong direction. The flawed code is not the one in the stop routine but it is in the start one (if you're using the code from gist). Double fork is a correct method, but the first fork should wait for the child process, not simply quit.
The correct sequence of commands (and the reasons to do the double fork) can be found here: http://lubutu.com/code/spawning-in-unix (see the "Double fork" section).
The sometimes you mention is happening when the first parent dies before getting SIGCHLD and it doesn't get to init.
As far as I remember, init should periodically read exit codes from it's children besides signal handling, but the upstart version simply relies on the latter (therefore the problem, see the comment on the similar bug: https://bugs.launchpad.net/upstart/+bug/406397/comments/2 ).
So the solution is to rewrite the first fork to actually wait for the child.
Update: Okay, you want some code. Here it goes: pastebin.com/W6LdjMEz I've updated the daemonize, fork and start methods.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.