简体   繁体   中英

python-twisted: fork for background non-returning processing

How to correctly fork a child process in twisted that does not use anything from twisted (but uses data from the parent process) (eg to process a “snapshot” of some data from the parent process and write it to file, without blocking)?

It seems if I do anything like clean shutdown in the child process after os.fork() , it closes some of the sockets / descriptors in the parent process; the only way to avoid that that I see is to do os.kill(os.getpid(), signal.SIGKILL) , which does seem like a bad idea (though not directly problematic).

(additionally, if a dict is changed in the parent process, can it be that it will change in the child process too? Quick test shows that it doesn't change, though. OS/kernels are debian stable / sid)

IReactorProcess.spawnProcess (usually available as from twisted.internet import reactor; reactor.spawnProcess ) can spawn a process running any available executable on your system. The subprocess does not need to use Twisted, or, indeed, even be in Python.

Do not call os.fork yourself. As you've discovered, it has lots of very peculiar interactions with process state, that spawnProcess will manage for you.

Among the problems with os.fork are:

  • Forking copies your current process state, but doesn't copy the state of threads. This means that any thread in the middle of modifying some global state will leave things half-broken, possibly holding some locks which will never be released. Don't run any threads in your application? Have you audited every library you use, every one of its dependencies, to ensure that none of them have ever or will ever use a background thread for anything?
  • You might think you're only touching certain areas of your application memory, but thanks to Python's reference counting, any object which you even peripherally look at (or is present on the stack) may have reference counts being incremented or decremented. Incrementing or decrementing a refcount is a write operation, which means that whole page (not just that one object) gets copied back into your process. So forked processes in Python tend to accumulate a much larger copied set than, say, forked C programs.
  • Many libraries, famously all of the libraries that make up the systems on macOS and iOS , cannot handle fork() correctly and will simply crash your program if you attempt to use them after fork but before exec .
  • There's a flag for telling file descriptors to close on exec - but no such flag to have them close on fork . So any files (including log files, and again, any background temp files opened by libraries you might not even be aware of) can get silently corrupted or truncated if you don't manage access to them carefully.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM