I am calling a Perl script from Python 3.7.3, with subprocess. The Perl script that is called is this one:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl
And the code I am using to call it is:
import sys
import os
import subprocess
import threading
def copy_out(source, dest):
for line in source:
dest.write(line)
num_threads=4
args = ["perl", "tokenizer.perl",
"-l", "en",
"-threads", str(num_threads)
]
with open(os.devnull, "wb") as devnull:
tokenizer = subprocess.Popen(args,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=devnull)
tokenizer_thread = threading.Thread(target=copy_out, args=(tokenizer.stdout, open("outfile", "wb")))
tokenizer_thread.start()
num_lines = 100000
for _ in range(num_lines):
tokenizer.stdin.write(b'Random line.\n')
tokenizer.stdin.close()
tokenizer_thread.join()
tokenizer.wait()
On my system, this leads to the following error:
Traceback (most recent call last):
File "t.py", line 27, in <module>
tokenizer.stdin.write(b'Random line.\n')
BrokenPipeError: [Errno 32] Broken pipe
I investigated this, and it turns out that if the -threads
argument for the subprocess is 1 the error is not thrown. As I don't want to give up on multithreading in the child process, my question is:
What is causing this error in the first place? "Who" is to blame for it: OS / environment, my Python code, the Perl code?
I am glad to provide more information if needed.
EDIT : To respond to some comments,
num_lines
smaller, I do not get this error anymore.num_threads
only controls the number of threads of the Perl subprocess. I never start several Python threads, just one. EDIT 2 : In my first edit, I incorrectly stated that this Perl program runs fine when called with eg -threads 4
from the command line: there, a different Perl was used that is compiled with multithreading. If I use the same Perl that is invoked from Python, I get:
$ cat [file with 100000 lines] | [correct perl] tokenizer.perl -l en -threads 4
Can't locate object method "new" via package "Thread" at
tokenizer.perl line 130, <STDIN> line 8000.
Which no doubt would have helped me debug this better.
The problem seems to be that the perl script crashes if perl
does not support threads. You can check if your perl
supports threads by running:
perl -MConfig -E 'say "Threads supported" if $Config{useithreads}'
In my case, the output was empty so I installed a new perl with thread support:
perlbrew install perl-5.30.0 --as=5.30.0-threads -Dusethreads
perlbrew use 5.30.0-threads
Then I ran the Python script again:
import sys
import os
import subprocess
import threading
def copy_out(source, dest):
for line in iter(source.readline, b''):
dest.write(line)
num_threads=4
args = ["perl", "tokenizer.perl",
"-l", "en",
"-threads", str(num_threads)
]
tokenizer = subprocess.Popen(
args,
bufsize=-1, #use default bufsize = 8192 bytes
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL)
tokenizer_thread = threading.Thread(
target=copy_out, args=(tokenizer.stdout, open("outfile", "wb")))
tokenizer_thread.start()
num_lines = 100000
for _ in range(num_lines):
tokenizer.stdin.write(b'Random line.\n')
tokenizer.stdin.close()
tokenizer_thread.join()
tokenizer.wait()
and it now ran to the end with no errors and produced the output file outfile
with 100000 lines.
What is causing this error in the first place?
Writing to a closed pipe causes the OS to send SIGPIPE
to the process calling write
. This allows program to work as generators. For example, the following won't run forever despite containing an infinite loop, because head
will exit and close its STDIN after reading ten lines, leading to perl
receiving a SIGPIPE.
perl -le'1 while print ++$i;' | head
If the SIGPIPE
signal is being ignored, the write
system call will return EPIPE
(Broken pipe) instead. The following won't run forever either because print
returns error EPIPE
once head
exits.
perl -le'$SIG{PIPE}="IGNORE"; 1 while print ++$i;' | head
From the fact that your Python program received an EPIPE
error, we deduce two facts:
SIGPIPE
signals, and So we must ask ourselves: Why would the Perl program close its STDIN? it's very unlikely that its STDIN was closed explicitly. By far, the most likely explanation is that the child process was terminated.
"Who" is to blame for it: OS / environment, my Python code, the Perl code?
That depends on what caused the Perl program to exit. The first thing to do is figure out what exit status was returned by the child process. Depending on the exit status, we'll know whether
If the exit code tells us the process was killed by a signal, the exit code will also tells us by which signal. This could give us some information. (This would be the hardest of the three scenarios to debug.)
If the exit code tells us the process returned an error, the error code itself might not contain any additional useful information, but an error message was surely sent to the child's STDERR to provide more information.
If the exit code tells us the process completed successfully, perhaps the arguments or input you are providing don't mean what you think they mean.
So make sure to call tokenizer.wait()
to collect the exit status and store it in tokenizer.returncode
. Also make sure to log what is being sent to STDERR.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.