简体   繁体   中英

Bizarre hang in python when subprocess returns an error

I have a python program designed to evolve a 3d model which is CFD analyzed in OpenFOAM. The analysis is conducted in parallel with a program called "mpirun"; my python script runs mpirun via subprocess.Popen. Nothing unusual so far. What is unusual is that when mpirun encounters an error with one of its children and kills its children, then prints an error... then the python parent process freezes. And doesn't freeze at some obvious place, like reading from the pipe... at random locations, it just stops... doing anything.

I tried running my program with "python3 -m trace --trace" to see what line things are stopping on, here's the final output:

foam.py(1765):       print("-B")
-B
foam.py(1766):       if match:
foam.py(1776):       print("-A")
-A
foam.py(1777):       if re.match(" *Sum of moments *", line_text):
 --- modulename: re, funcname: match
re.py(163):     return _compile(pattern, flags).match(string)
 --- modulename: re, funcname: _compile
re.py(280):     try:
re.py(281):         p, loc = _cache[type(pattern), pattern, flags]
re.py(282):         if loc is None or loc == _locale.setlocale(_locale.LC_CTYPE):
re.py(283):             return p
foam.py(1780):       print("A")
A
foam.py(1781):       if force_mode:

As you can see, it gets up to "if force_mode:".... and then just stops. Obviously "if bool" should never hang. I've been working on trying to figure this out for several days and I'm no closer to an answer.

It doesn't seem to make a difference how I start the process via subprocess.Popen - shell=True, shell=False, running "mpirun" directly, running it through a bash wrapper script... nothing matters (the only thing I've kept consistent is stdout=subprocess.PIPE, since I have to be able to read the output). As soon as one of mpirun's children dies and it reports its error, foam.py just hangs.

Any clue what might be going on here? I'm stumped. :(

Answer: As per above, the fact that the output of the program was being run through "tee" to log it so that I could examine all of the messages was actually misleading me. Because tee buffers its content, I wasn't seeing the last messages to be printed. After removing tee, I was able to see that it was actually hanging on a pipe read. I was able to fix this by looking for the death messages and then calling kill on the pipe.

Thanks for the help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM