如何使用Python將stdin / stdout傳遞給Perl腳本

Question

這個Python代碼通過Perl腳本很好地管理數據。

import subprocess
kw = {}
kw['executable'] = None
kw['shell'] = True
kw['stdin'] = None
kw['stdout'] = subprocess.PIPE
kw['stderr'] = subprocess.PIPE
args = ' '.join(['/usr/bin/perl','-w','/path/script.perl','<','/path/mydata'])
subproc = subprocess.Popen(args,**kw)
for line in iter(subproc.stdout.readline, ''):
    print line.rstrip().decode('UTF-8')

但是，它要求我首先將緩沖區保存到磁盤文件（/ path / mydata）。 循環遍歷Python代碼中的數據並逐行傳遞給子流程更簡潔：

import subprocess
kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args,**kw)
f = codecs.open('/path/mydata','r','UTF-8')
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    print line.strip()  ### code hangs after printing this ###
    for line in iter(subproc.stdout.readline, ''):
        print line.rstrip().decode('UTF-8')
subproc.terminate()
f.close()

將第一行發送到子進程后，代碼與readline掛起。 我有其他可執行文件完全使用這個完全相同的代碼。

我的數據文件可能非常大（1.5 GB）有沒有辦法在不保存到文件的情況下完成數據管道？ 我不想重寫perl腳本以與其他系統兼容。

Answer 1

您的代碼在該行處阻止：

for line in iter(subproc.stdout.readline, ''):

因為此迭代終止的唯一方法是在達到EOF（文件結束）時，這將在子進程終止時發生。 您不希望等到進程終止，但是，您只想等到它完成處理發送給它的行。

此外，正如克里斯摩根已經指出的那樣，你遇到了緩沖問題。 stackoverflow的另一個問題討論了如何使用子進程進行非阻塞讀取。 我已經快速而骯臟地修改了該問題的代碼到您的問題：

def enqueue_output(out, queue):
    for line in iter(out.readline, ''):
        queue.put(line)
    out.close()

kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args, **kw)
f = codecs.open('/path/mydata','r','UTF-8')
q = Queue.Queue()
t = threading.Thread(target = enqueue_output, args = (subproc.stdout, q))
t.daemon = True
t.start()
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    print "Sent:", line.strip()  ### code hangs after printing this ###
    try:
        line = q.get_nowait()
    except Queue.Empty:
        pass
    else:
        print "Received:", line.rstrip().decode('UTF-8')

subproc.terminate()
f.close()

您很可能需要對此代碼進行修改，但至少它不會阻止。

Answer 2

謝謝srgerg。 我也嘗試過線程解決方案。 然而，這個解決方案總是懸而未決。 我之前的代碼和srgerg的代碼都缺少最終解決方案，你的提示給了我最后一個想法。

最終解決方案寫入足夠的虛擬數據強制緩沖區中的最終有效行。 為了支持這一點，我添加了跟蹤有多少有效行寫入stdin的代碼。 線程循環打開輸出文件，保存數據，並在讀取行等於有效輸入行時中斷。 此解決方案可確保逐行讀取和寫入任何大小的文件。

def std_output(stdout,outfile=''):
    out = 0
    f = codecs.open(outfile,'w','UTF-8')
    for line in iter(stdout.readline, ''):
        f.write('%s\n'%(line.rstrip().decode('UTF-8')))
        out += 1
        if i == out: break
    stdout.close()
    f.close()

outfile = '/path/myout'
infile = '/path/mydata'

subproc = subprocess.Popen(args,**kw)
t = threading.Thread(target=std_output,args=[subproc.stdout,outfile])
t.daemon = True
t.start()

i = 0
f = codecs.open(infile,'r','UTF-8')
for line in f:
    subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
    i += 1
subproc.stdin.write('%s\n'%(' '*4096)) ### push dummy data ###
f.close()
t.join()
subproc.terminate()

Answer 3

請參閱手冊中提到的有關使用Popen.stdin和Popen.stdout （在Popen.stdin上方）的警告：

警告：使用communicate()而不是.stdin.write ， .stdout.read或.stderr.read來避免由於任何其他OS管道緩沖區填滿和阻止子進程而導致的死鎖。

我認識到，在內存中一個千兆字節和半弦一下子還不是很理想的，但使用communicate()是一種方法，將工作，因為你已經觀察到，同時，一旦操作系統管緩沖區已滿， stdin.write() + stdout.read()方式可能會變得死鎖。

使用communicate()可行？

如何使用Python將stdin / stdout傳遞給Perl腳本

問題描述

3 個解決方案

解決方案1
1 2012-01-02 08:51:46

解決方案2
1 已采納 2012-01-02 14:26:24

解決方案3
0 2012-01-02 08:35:13

如何使用Python將stdin / stdout傳遞給Perl腳本

問題描述

3 個解決方案

解決方案1 1 2012-01-02 08:51:46

解決方案2 1 已采納 2012-01-02 14:26:24

解決方案3 0 2012-01-02 08:35:13

解決方案1
1 2012-01-02 08:51:46

解決方案2
1 已采納 2012-01-02 14:26:24

解決方案3
0 2012-01-02 08:35:13