如何使用Python将stdin / stdout传递给Perl脚本

Question

这个Python代码通过Perl脚本很好地管理数据。

import subprocess
kw = {}
kw['executable'] = None
kw['shell'] = True
kw['stdin'] = None
kw['stdout'] = subprocess.PIPE
kw['stderr'] = subprocess.PIPE
args = ' '.join(['/usr/bin/perl','-w','/path/script.perl','<','/path/mydata'])
subproc = subprocess.Popen(args,**kw)
for line in iter(subproc.stdout.readline, ''):
    print line.rstrip().decode('UTF-8')

但是，它要求我首先将缓冲区保存到磁盘文件（/ path / mydata）。 循环遍历Python代码中的数据并逐行传递给子流程更简洁：

import subprocess
kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args,**kw)
f = codecs.open('/path/mydata','r','UTF-8')
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    print line.strip()  ### code hangs after printing this ###
    for line in iter(subproc.stdout.readline, ''):
        print line.rstrip().decode('UTF-8')
subproc.terminate()
f.close()

将第一行发送到子进程后，代码与readline挂起。 我有其他可执行文件完全使用这个完全相同的代码。

我的数据文件可能非常大（1.5 GB）有没有办法在不保存到文件的情况下完成数据管道？ 我不想重写perl脚本以与其他系统兼容。

Answer 1

您的代码在该行处阻止：

for line in iter(subproc.stdout.readline, ''):

因为此迭代终止的唯一方法是在达到EOF（文件结束）时，这将在子进程终止时发生。 您不希望等到进程终止，但是，您只想等到它完成处理发送给它的行。

此外，正如克里斯摩根已经指出的那样，你遇到了缓冲问题。 stackoverflow的另一个问题讨论了如何使用子进程进行非阻塞读取。 我已经快速而肮脏地修改了该问题的代码到您的问题：

def enqueue_output(out, queue):
    for line in iter(out.readline, ''):
        queue.put(line)
    out.close()

kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args, **kw)
f = codecs.open('/path/mydata','r','UTF-8')
q = Queue.Queue()
t = threading.Thread(target = enqueue_output, args = (subproc.stdout, q))
t.daemon = True
t.start()
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    print "Sent:", line.strip()  ### code hangs after printing this ###
    try:
        line = q.get_nowait()
    except Queue.Empty:
        pass
    else:
        print "Received:", line.rstrip().decode('UTF-8')

subproc.terminate()
f.close()

您很可能需要对此代码进行修改，但至少它不会阻止。

Answer 2

谢谢srgerg。 我也尝试过线程解决方案。 然而，这个解决方案总是悬而未决。 我之前的代码和srgerg的代码都缺少最终解决方案，你的提示给了我最后一个想法。

最终解决方案写入足够的虚拟数据强制缓冲区中的最终有效行。 为了支持这一点，我添加了跟踪有多少有效行写入stdin的代码。 线程循环打开输出文件，保存数据，并在读取行等于有效输入行时中断。 此解决方案可确保逐行读取和写入任何大小的文件。

def std_output(stdout,outfile=''):
    out = 0
    f = codecs.open(outfile,'w','UTF-8')
    for line in iter(stdout.readline, ''):
        f.write('%s\n'%(line.rstrip().decode('UTF-8')))
        out += 1
        if i == out: break
    stdout.close()
    f.close()

outfile = '/path/myout'
infile = '/path/mydata'

subproc = subprocess.Popen(args,**kw)
t = threading.Thread(target=std_output,args=[subproc.stdout,outfile])
t.daemon = True
t.start()

i = 0
f = codecs.open(infile,'r','UTF-8')
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    i += 1
subproc.stdin.write('%s\n'%(' '*4096)) ### push dummy data ###
f.close()
t.join()
subproc.terminate()

Answer 3

请参阅手册中提到的有关使用Popen.stdin和Popen.stdout （在Popen.stdin上方）的警告：

警告：使用communicate()而不是.stdin.write ， .stdout.read或.stderr.read来避免由于任何其他OS管道缓冲区填满和阻止子进程而导致的死锁。

我认识到，在内存中一个千兆字节和半弦一下子还不是很理想的，但使用communicate()是一种方法，将工作，因为你已经观察到，同时，一旦操作系统管缓冲区已满， stdin.write() + stdout.read()方式可能会变得死锁。

使用communicate()可行？

如何使用Python将stdin / stdout传递给Perl脚本

问题描述

3 个解决方案

解决方案1
1 2012-01-02 08:51:46

解决方案2
1 已采纳 2012-01-02 14:26:24

解决方案3
0 2012-01-02 08:35:13

如何使用Python将stdin / stdout传递给Perl脚本

问题描述

3 个解决方案

解决方案1 1 2012-01-02 08:51:46

解决方案2 1 已采纳 2012-01-02 14:26:24

解决方案3 0 2012-01-02 08:35:13

解决方案1
1 2012-01-02 08:51:46

解决方案2
1 已采纳 2012-01-02 14:26:24

解决方案3
0 2012-01-02 08:35:13