[英]How to use Python multiprocessing Pool.map to fill numpy array in a for loop
我想在for循环中填充2D-numpy数组,并使用多处理来固定计算。
import numpy
from multiprocessing import Pool
array_2D = numpy.zeros((20,10))
pool = Pool(processes = 4)
def fill_array(start_val):
return range(start_val,start_val+10)
list_start_vals = range(40,60)
for line in xrange(20):
array_2D[line,:] = pool.map(fill_array,list_start_vals)
pool.close()
print array_2D
执行它的效果是Python运行4个子进程并占用4个CPU核心,但执行没有完成,并且不打印数组。 如果我尝试将数组写入磁盘,则没有任何反应。
谁能告诉我为什么?
以下作品。 首先,保护主块内部代码的主要部分是个好主意,以避免出现奇怪的副作用。 poo.map()
的结果是一个列表,其中包含迭代器list_start_vals
每个值的list_start_vals
,这样您就不必再创建array_2D
。
import numpy as np
from multiprocessing import Pool
def fill_array(start_val):
return list(range(start_val, start_val+10))
if __name__=='__main__':
pool = Pool(processes=4)
list_start_vals = range(40, 60)
array_2D = np.array(pool.map(fill_array, list_start_vals))
pool.close() # ATTENTION HERE
print array_2D
也许你将无法使用pool.close()
,从@hpaulj的评论你可以删除这一行,以防你有问题...
如果您仍想使用数组填充,则可以使用pool.apply_async
而不是pool.map
。 在Saullo的回答中工作:
import numpy as np
from multiprocessing import Pool
def fill_array(start_val):
return range(start_val, start_val+10)
if __name__=='__main__':
pool = Pool(processes=4)
list_start_vals = range(40, 60)
array_2D = np.zeros((20,10))
for line, val in enumerate(list_start_vals):
result = pool.apply_async(fill_array, [val])
array_2D[line,:] = result.get()
pool.close()
print array_2D
这比map
慢一点。 但它不会产生运行时错误,例如我对地图版本的测试: Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored
问题是由于在for循环中运行pool.map
,map()方法的结果在功能上等同于内置map(),除了单个任务是并行运行的。 所以在你的情况下,pool.map(fill_array,list_start_vals)将被调用20次并开始并行运行for循环的每次迭代,下面的代码应该工作
码:
#!/usr/bin/python
import numpy
from multiprocessing import Pool
def fill_array(start_val):
return range(start_val,start_val+10)
if __name__ == "__main__":
array_2D = numpy.zeros((20,10))
pool = Pool(processes = 4)
list_start_vals = range(40,60)
# running the pool.map in a for loop is wrong
#for line in xrange(20):
# array_2D[line,:] = pool.map(fill_array,list_start_vals)
# get the result of pool.map (list of values returned by fill_array)
# in a pool_result list
pool_result = pool.map(fill_array,list_start_vals)
# the pool is processing its inputs in parallel, close() and join()
#can be used to synchronize the main process
#with the task processes to ensure proper cleanup.
pool.close()
pool.join()
# Now assign the pool_result to your numpy
for line,result in enumerate(pool_result):
array_2D[line,:] = result
print array_2D
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.