簡體   English   中英

Python:並行執行cat子進程

[英]Python: execute cat subprocess in parallel

我正在跑幾只cat | zgrep cat | zgrep命令在遠程服務器上並單獨收集它們的輸出以供進一步處理:

class MainProcessor(mp.Process):
    def __init__(self, peaks_array):
        super(MainProcessor, self).__init__()
        self.peaks_array = peaks_array

    def run(self):
        for peak_arr in self.peaks_array:
            peak_processor = PeakProcessor(peak_arr)
            peak_processor.start()

class PeakProcessor(mp.Process):
    def __init__(self, peak_arr):
        super(PeakProcessor, self).__init__()
        self.peak_arr = peak_arr

    def run(self):
        command = 'ssh remote_host cat files_to_process | zgrep --mmap "regex" '
        log_lines = (subprocess.check_output(command, shell=True)).split('\n')
        process_data(log_lines)

但是,這會導致子進程('ssh ... cat ...')命令的順序執行。 第二個峰值等待第一個完成,依此類推。

如何修改此代碼以便子進程調用並行運行,同時仍然能夠單獨收集每個的輸出?

您既不需要multiprocessing也不需要threading來並行運行子進程,例如:

#!/usr/bin/env python
from subprocess import Popen

# run commands in parallel
processes = [Popen("echo {i:d}; sleep 2; echo {i:d}".format(i=i), shell=True)
             for i in range(5)]
# collect statuses
exitcodes = [p.wait() for p in processes]

它同時運行5個shell命令。 注意:此處既不使用線程也不使用multiprocessing模塊。 沒有一點加符號&到shell命令: Popen不會等待命令完成。 您需要顯式調用.wait()

它很方便,但沒有必要使用線程來收集子進程的輸出:

#!/usr/bin/env python
from multiprocessing.dummy import Pool # thread pool
from subprocess import Popen, PIPE, STDOUT

# run commands in parallel
processes = [Popen("echo {i:d}; sleep 2; echo {i:d}".format(i=i), shell=True,
                   stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
             for i in range(5)]

# collect output in parallel
def get_lines(process):
    return process.communicate()[0].splitlines()

outputs = Pool(len(processes)).map(get_lines, processes)

相關: Python線程化多個bash子進程?

這是在同一個線程中同時從多個子進程輸出的代碼示例:

#!/usr/bin/env python3
import asyncio
import sys
from asyncio.subprocess import PIPE, STDOUT

@asyncio.coroutine
def get_lines(shell_command):
    p = yield from asyncio.create_subprocess_shell(shell_command,
            stdin=PIPE, stdout=PIPE, stderr=STDOUT)
    return (yield from p.communicate())[0].splitlines()

if sys.platform.startswith('win'):
    loop = asyncio.ProactorEventLoop() # for subprocess' pipes on Windows
    asyncio.set_event_loop(loop)
else:
    loop = asyncio.get_event_loop()

# get commands output in parallel
coros = [get_lines('"{e}" -c "print({i:d}); import time; time.sleep({i:d})"'
                    .format(i=i, e=sys.executable)) for i in range(5)]
print(loop.run_until_complete(asyncio.gather(*coros)))
loop.close()

另一種方法(而不是將shell進程放在后台的其他建議)是使用多線程。

您擁有的run方法將執行以下操作:

thread.start_new_thread ( myFuncThatDoesZGrep)

要收集結果,您可以執行以下操作:

class MyThread(threading.Thread):
   def run(self):
       self.finished = False
       # Your code to run the command here.
       blahBlah()
       # When finished....
       self.finished = True
       self.results = []

在多線程的鏈接中運行如上所述的線程。 當你的線程對象有myThread.finished == True時,你可以通過myThread.results收集結果。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM