python 在多個 CPU 內核上傳播 subprocess.call

Question

我有一個使用子進程包在 shell 中運行的 python 代碼：

subprocess.call(mycode.py, shell=inshell)

當我執行 top 命令時，我看到我只使用了大約 30% 或更少的 CPU。 我意識到某些命令可能正在使用磁盤而不是 cpu，因此我正在對速度進行計時。 在 linux 系統上運行它的速度似乎比 mac 2 核心系統慢。

我如何將其與線程或多處理包並行化，以便我可以在上述 linux 系統上使用多個 CPU 內核？

Answer 1

要並行化在mycode.py完成的工作，您需要組織代碼以使其符合以下基本模式：

# Import the kind of pool you want to use (processes or threads).
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

# Collect work items as an iterable of single values (eg tuples, 
# dicts, or objects). If you can't hold all items in memory,
# define a function that yields work items instead.
work_items = [
    (1, 'A', True),
    (2, 'X', False),
    ...
]

# Define a callable to do the work. It should take one work item.
def worker(tup):
    # Do the work.
    ...

    # Return any results.
    ...

# Create a ThreadPool (or a process Pool) of desired size.
# What size? Experiment. Slowly increase until it stops helping.
pool = ThreadPool(4)

# Do work and collect results.
# Or use pool.imap() or pool.imap_unordered().
work_results = pool.map(worker, work_items)

# Wrap up.
pool.close()
pool.join()

---------------------

# Or, in Python 3.3+ you can do it like this, skipping the wrap-up code.
with ThreadPool(4) as pool:
    work_results = pool.map(worker, work_items)

Answer 2

FMc 的回答稍有改動，

work_items = [(1, 'A', True), (2, 'X', False), (3, 'B', False)]
def worker(tup):
 for i in range(5000000):
     print(work_items)
 return

pool = Pool(processes = 8)
start = time.time()
work_results = pool.map(worker, work_items)
end = time.time()
print(end-start)
pool.close()
pool.join()

上面的代碼需要 53.60 秒。 然而，下面的技巧需要 27.34 秒。

from multiprocessing import Pool
import functools
import time

work_items = [(1, 'A', True), (2, 'X', False), (3, 'B', False)]

def worker(tup):
    for i in range(5000000):
        print(work_items)
    return

def parallel_attribute(worker):
    def easy_parallelize(worker, work_items):
        pool = Pool(processes = 8)
        work_results = pool.map(worker, work_items)
        pool.close()
        pool.join()
    from functools import partial 
    return partial(easy_parallelize, worker)

start = time.time()
worker.parallel = parallel_attribute(worker(work_items))
end = time.time()
print(end - start)

兩條評論：1) 我沒有看到使用多處理虛擬機有太大區別 2) 使用 Python 的部分函數（帶嵌套的作用域）就像一個很棒的包裝器，可以將計算時間減少 1/2。 參考：https ://www.binpress.com/tutorial/simple-python-parallelism/121

另外，謝謝FMc！

Answer 3

好吧，您可以先創建一個線程，然后將要並行化的函數傳遞給它。 在函數內部，您有子流程。

import threading
import subprocess

def worker():
    """thread worker function"""
    print 'Worker'
    subprocess.call(mycode.py, shell=inshell)
    return

threads = []
for i in range(5):
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()

python 在多個 CPU 內核上傳播 subprocess.call

問題描述

3 個解決方案

解決方案1
2 2017-01-13 01:56:33

解決方案2
1 已采納 2017-01-13 22:25:13

解決方案3
0 2017-01-13 01:41:24

python 在多個 CPU 內核上傳播 subprocess.call

問題描述

3 個解決方案

解決方案1 2 2017-01-13 01:56:33

解決方案2 1 已采納 2017-01-13 22:25:13

解決方案3 0 2017-01-13 01:41:24

解決方案1
2 2017-01-13 01:56:33

解決方案2
1 已采納 2017-01-13 22:25:13

解決方案3
0 2017-01-13 01:41:24