[英]How to process input in parallel with python, but without processes?
I have a list of input data and would like to process it in parallel, but processing each takes time as network io is involved. 我有一个输入数据列表,并希望并行处理它,但处理每个输入数据需要时间,因为涉及到网络io。 CPU usage is not a problem.
CPU使用率不是问题。
I would not like to have the overhead of additional processes since I have a lot of things to process at a time and do not want to setup inter process communication. 我不希望有额外进程的开销,因为我一次要处理很多事情并且不想设置进程间通信。
# the parallel execution equivalent of this?
import time
input_data = [1,2,3,4,5,6,7]
input_processor = time.sleep
results = map(input_processor, input_data)
The code I am using makes use of twisted.internet.defer so a solution involving that is fine as well. 我正在使用的代码使用twisted.internet.defer,因此涉及到的解决方案也很好。
You can easily define Worker
threads that work in parallel till a queue is empty. 您可以轻松定义并行工作的
Worker
线程,直到队列为空。
from threading import Thread
from collections import deque
import time
# Create a new class that inherits from Thread
class Worker(Thread):
def __init__(self, inqueue, outqueue, func):
'''
A worker that calls func on objects in inqueue and
pushes the result into outqueue
runs until inqueue is empty
'''
self.inqueue = inqueue
self.outqueue = outqueue
self.func = func
super().__init__()
# override the run method, this is starte when
# you call worker.start()
def run(self):
while self.inqueue:
data = self.inqueue.popleft()
print('start')
result = self.func(data)
self.outqueue.append(result)
print('finished')
def test(x):
time.sleep(x)
return 2 * x
if __name__ == '__main__':
data = 12 * [1, ]
queue = deque(data)
result = deque()
# create 3 workers working on the same input
workers = [Worker(queue, result, test) for _ in range(3)]
# start the workers
for worker in workers:
worker.start()
# wait till all workers are finished
for worker in workers:
worker.join()
print(result)
As expected, this runs ca. 正如预期的那样, 4 seconds.
4秒
One could also write a simple Pool class to get rid of the noise in the main function: 还可以编写一个简单的Pool类来消除main函数中的噪声:
from threading import Thread
from collections import deque
import time
class Pool():
def __init__(self, n_threads):
self.n_threads = n_threads
def map(self, func, data):
inqueue = deque(data)
result = deque()
workers = [Worker(inqueue, result, func) for i in range(self.n_threads)]
for worker in workers:
worker.start()
for worker in workers:
worker.join()
return list(result)
class Worker(Thread):
def __init__(self, inqueue, outqueue, func):
'''
A worker that calls func on objects in inqueue and
pushes the result into outqueue
runs until inqueue is empty
'''
self.inqueue = inqueue
self.outqueue = outqueue
self.func = func
super().__init__()
# override the run method, this is starte when
# you call worker.start()
def run(self):
while self.inqueue:
data = self.inqueue.popleft()
print('start')
result = self.func(data)
self.outqueue.append(result)
print('finished')
def test(x):
time.sleep(x)
return 2 * x
if __name__ == '__main__':
data = 12 * [1, ]
pool = Pool(6)
result = pool.map(test, data)
print(result)
You can use the multiprocessing module. 您可以使用多处理模块。 Without knowing more about how you want it to process, you can use a pool of workers:
如果不了解您希望如何处理的更多信息,您可以使用一组工作人员:
import multiprocessing as mp
import time
input_processor = time.sleep
core_num = mp.cpu_count()
pool=Pool(processes = core_num)
result = [pool.apply_async(input_processor(i)) for for i in range(1,7+1) ]
result_final = [p.get() for p in results]
for n in range(1,7+1):
print n, result_final[n]
The above keeps track of the order each task is done. 以上记录了每项任务的完成顺序。 It also does not allow the processes to talk to each other.
它也不允许进程相互通信。
Editted: To call this as a function, you should input the input data and number of processors: 编辑:要将此作为一个函数调用,您应该输入输入数据和处理器数量:
def parallel_map(processor_count, input_data):
pool=Pool(processes = processor_count)
result = [pool.apply_async(input_processor(i)) for for i in input_data ]
result_final = np.array([p.get() for p in results])
result_data = np.vstack( (input_data, result_final))
return result_data
I assume you are using Twisted. 我假设你使用Twisted。 In that case, you can launch multiple deferreds and wait for the completion of all of them using DeferredList:
在这种情况下,您可以使用DeferredList启动多个延迟并等待所有延迟完成:
http://twistedmatrix.com/documents/15.4.0/core/howto/defer.html#deferredlist http://twistedmatrix.com/documents/15.4.0/core/howto/defer.html#deferredlist
If input_processor is a non-blocking call (returns deferred): 如果input_processor是非阻塞调用(返回延迟):
def main():
input_data = [1,2,3,4,5,6,7]
input_processor = asyn_function
for entry in input_data:
requests.append(defer.maybeDeferred(input_processor, entry))
deferredList = defer.DeferredList(requests, , consumeErrors=True)
deferredList.addCallback(gotResults)
return deferredList
def gotResults(results):
for (success, value) in result:
if success:
print 'Success:', value
else:
print 'Failure:', value.getErrorMessage()
In case input_processor is a long/blocking function, you can use deferToThread instead of maybeDeferred: 如果input_processor是一个long / blocking函数,你可以使用deferToThread而不是maybeDeferred:
def main():
input_data = [1,2,3,4,5,6,7]
input_processor = syn_function
for entry in input_data:
requests.append(threads.deferToThread(input_processor, entry))
deferredList = defer.DeferredList(requests, , consumeErrors=True)
deferredList.addCallback(gotResults)
return deferredList
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.