[英]Python: multiprocess workers, tracking tasks completed (missing completions)
默認的multiprocessing.Pool
代碼包括一個計數器,用於跟蹤工作人員已完成的任務數:
completed += 1
logging.debug('worker exiting after %d tasks' % completed)
但是將pool.map
從range(12)
pool.map
到range(20)
會導致計數器錯誤(這似乎與創建工作程序無關)。 我也不清楚是什么原因造成的。
例如:
import multiprocessing as mp
def ret_x(x):
return x
def inform():
print('made a worker!')
pool = mp.Pool(2, maxtasksperchild=2, initializer=inform)
res= pool.map(ret_x, range(8))
print(res)
可以正常工作,給出:
made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
[0, 1, 2, 3, 4, 5, 6, 7]
但是,將range
更改為20
不會顯示正在創建任何其他工作程序,也不會顯示總共20個已完成的任務,即使已完成的范圍已在預期列表中返回。
made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
worker exiting after 1 tasks
之所以這樣工作是因為您沒有在pool.map中明確定義“ chunksize”:
map(func, iterable[, chunksize])
此方法將可迭代項分為多個塊,將其作為單獨的任務提交給流程池。 這些塊的(大約)大小可以通過將chunksize設置為正整數來指定
資料來源: https : //docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool
對於8個項目,考慮len(pool)= 2,chunksize將為1(divmod(8,2 * 4)),因此您看到(8/1)/ 2工人= 4工人
workers = (len of items / chunksize) / tasks per process
對於20個項目,考慮len(pool)= 2,chunksize將為3(divmode(20,2 * 4)),因此您會看到類似(20/3)/ 2 = 3.3 worker
對於40 ... chunksize = 5,工作人員=(40/5)/ 5 = 4個工作人員
如果需要,可以設置chunksize = 1
res = pool.map(ret_x, range(40), 1)
您將看到(20/1)/ 2 = 10個工人
python mppp.py
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
因此,chunksize就像一個流程的單位工作量……之類。
如何計算塊大小: https : //hg.python.org/cpython/file/1c54def5947c/Lib/multiprocessing/pool.py#l305
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.