[英]Running same function for multiple files in parallel in python
I am trying to run a function in parallel for multiple files and want all of them to terminate before a point.我正在尝试为多个文件并行运行一个函数,并希望它们在某个点之前终止。
For Example: There is a loop例如:有一个循环
def main():
for item in list:
function_x(item)
function_y(list)
Now what I want is that this function_x
should run in parallel for all items.现在我想要的是这个
function_x
应该对所有项目并行运行。 But this function should be executed for all items before my function_y
is called.但是这个函数应该在我的
function_y
被调用之前对所有项目执行。
I am planning to use celery for this.我打算为此使用芹菜。 but can not understand how to do this.
但无法理解如何做到这一点。
Here is my final test code.这是我的最终测试代码。
All I needed to do is use multiprocessing library.我需要做的就是使用多处理库。
from multiprocessing import Process
from time import sleep
Pros = []
def function_x(i):
for j in range(0,5):
sleep(3)
print i
def function_y():
print "done"
def main():
for i in range(0,3):
print "Thread Started"
p = Process(target=function_x, args=(i,))
Pros.append(p)
p.start()
# block until all the threads finish (i.e. block until all function_x calls finish)
for t in Pros:
t.join()
function_y()
you can use threads for this.您可以为此使用线程。
thread.join
is the function you need, this function block until the thread is finished. thread.join
是你需要的函数,这个函数会阻塞直到线程完成。
you can do this:你可以这样做:
import threading
threads = []
def main():
for item in list:
t = threading.Thread(target=function_x, args=(item,))
threads.append(t)
t.start()
# block until all the threads finish (i.e. until all function_a functions finish)
for t in threads:
t.join()
function_y(list)
You can do this elegantly with Ray , which is a library for writing parallel and distributed Python.您可以使用Ray优雅地做到这一点,它是一个用于编写并行和分布式 Python 的库。
Simply declare the function_x
with @ray.remote
, and then it can be executed in parallel by invoking it with function_x.remote
and the results can be retrieved with ray.get
.只需使用
@ray.remote
声明function_x
,然后可以通过使用function_x.remote
调用它来并行执行,并且可以使用ray.get
检索结果。
import ray
import time
ray.init()
@ray.remote
def function_x(item):
time.sleep(1)
return item
def function_y(list):
pass
list = [1, 2, 3, 4]
# Process the items in parallel.
results = ray.get([function_x.remote(item) for item in list])
function_y(list)
View the Ray documentation .查看Ray 文档。
Here is the documentation for celery groups , which is what I think you want. Here is the documentation for celery groups ,这就是我认为你想要的。 Use
AsyncResult.get()
instead of AsyncResult.ready()
to block.使用
AsyncResult.get()
而不是AsyncResult.ready()
来阻止。
#!/bin/env python
import concurrent.futures
def function_x(item):
return item * item
def function_y(lst):
return [x * x for x in lst]
a_list = range(10)
if __name__ == '__main__':
with concurrent.futures.ThreadPoolExecutor(10) as tp:
future_to_function_x = {
tp.submit(function_x, item): item
for item in a_list
}
results = {}
for future in concurrent.futures.as_completed(future_to_function_x):
item = future_to_function_x[future]
try:
res = future.result()
except Exception as e:
print('Exception when processing item "%s": %s' % (item, e))
else:
results[item] = res
print('results:', results)
after = function_y(results.values())
print('after:', after)
Output:输出:
results: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
after: [0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.