[英]Python multiprocessing synchronization
我有一个功能“功能”,我想用2次5次cpus多次处理调用10次。
因此,我需要一种方法来同步流程,如下面的代码所述。
如果不使用多处理池,这可能吗? 如果我这样做,我会得到奇怪的错误(例如“UnboundLocalError:在赋值之前引用的局部变量'fd'(我没有这样的变量))。 这些过程似乎随机终止。
如果可能的话,我想在没有游泳池的情况下这样做。 谢谢!
number_of_cpus = 5
number_of_iterations = 2
# An array for the processes.
processing_jobs = []
# Start 5 processes 2 times.
for iteration in range(0, number_of_iterations):
# TODO SYNCHRONIZE HERE
# Start 5 processes at a time.
for cpu_number in range(0, number_of_cpus):
# Calculate an offset for the current function call.
file_offset = iteration * cpu_number * number_of_files_per_process
p = multiprocessing.Process(target=function, args=(file_offset,))
processing_jobs.append(p)
p.start()
# TODO SYNCHRONIZE HERE
这是我在池中运行代码时遇到的错误的(匿名)回溯:
Process Process-5:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "python_code_3.py", line 88, in function_x
xyz = python_code_1.function_y(args)
File "/python_code_1.py", line 254, in __init__
self.WK = file.WK(filename)
File "/python_code_2.py", line 1754, in __init__
self.__parse__(name, data, fast_load)
File "/python_code_2.py", line 1810, in __parse__
fd.close()
UnboundLocalError: local variable 'fd' referenced before assignment
大多数进程都会崩溃,但不是全部。 当我增加进程数时,它们中的更多似乎会崩溃。 我还以为这可能是由于内存限制......
游泳池非常容易使用。 这是一个完整的例子:
import multiprocessing
def calc(num):
return num*2
if __name__=='__main__': # required for Windows
pool = multiprocessing.Pool() # one Process per CPU
for output in pool.map(calc, [1,2,3]):
print 'output:',output
output: 2
output: 4
output: 6
以下是如何在不使用池的情况下执行所需的同步:
import multiprocessing
def function(arg):
print ("got arg %s" % arg)
if __name__ == "__main__":
number_of_cpus = 5
number_of_iterations = 2
# An array for the processes.
processing_jobs = []
# Start 5 processes 2 times.
for iteration in range(1, number_of_iterations+1): # Start the range from 1 so we don't multiply by zero.
# Start 5 processes at a time.
for cpu_number in range(1, number_of_cpus+1):
# Calculate an offset for the current function call.
file_offset = iteration * cpu_number * number_of_files_per_process
p = multiprocessing.Process(target=function, args=(file_offset,))
processing_jobs.append(p)
p.start()
# Wait for all processes to finish.
for proc in processing_jobs:
proc.join()
# Empty active job list.
del processing_jobs[:]
# Write file here
print("Writing")
这是一个Pool
:
import multiprocessing
def function(arg):
print ("got arg %s" % arg)
if __name__ == "__main__":
number_of_cpus = 5
number_of_iterations = 2
pool = multiprocessing.Pool(number_of_cpus)
for i in range(1, number_of_iterations+1): # Start the range from 1 so we don't multiply by zero
file_offsets = [number_of_files_per_process * i * cpu_num for cpu_num in range(1, number_of_cpus+1)]
pool.map(function, file_offsets)
print("Writing")
# Write file here
如您所见, Pool
解决方案更好。
但是,这并不能解决您的追溯问题。 我很难说如何在不了解实际导致这种情况的情况下解决这个问题。 您可能需要使用multiprocessing.Lock
来同步对资源的访问。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.