简体   繁体   English

在python中并行运行多个文件的相同功能

[英]Running same function for multiple files in parallel in python

I am trying to run a function in parallel for multiple files and want all of them to terminate before a point.我正在尝试为多个文件并行运行一个函数,并希望它们在某个点之前终止。

For Example: There is a loop例如:有一个循环

def main():
  for item in list:
     function_x(item)

  function_y(list)

Now what I want is that this function_x should run in parallel for all items.现在我想要的是这个function_x应该对所有项目并行运行。 But this function should be executed for all items before my function_y is called.但是这个函数应该在我的function_y被调用之前对所有项目执行。

I am planning to use celery for this.我打算为此使用芹菜。 but can not understand how to do this.但无法理解如何做到这一点。

Here is my final test code.这是我的最终测试代码。

All I needed to do is use multiprocessing library.我需要做的就是使用多处理库。

from multiprocessing import Process
from time import sleep

Pros = []

def function_x(i):
    for j in range(0,5):
        sleep(3)
        print i

def function_y():
    print "done"

def main():
  for i in range(0,3):
     print "Thread Started"
     p = Process(target=function_x, args=(i,))
     Pros.append(p)
     p.start()

  # block until all the threads finish (i.e. block until all function_x calls finish)    
  for t in Pros:
     t.join()

  function_y()

you can use threads for this.您可以为此使用线程。 thread.join is the function you need, this function block until the thread is finished. thread.join是你需要的函数,这个函数会阻塞直到线程完成。
you can do this:你可以这样做:

import threading
threads = []
def main():
  for item in list:
     t = threading.Thread(target=function_x, args=(item,))
     threads.append(t)
     t.start()

  # block until all the threads finish (i.e. until all function_a functions finish)    
  for t in threads:
     t.join()

  function_y(list)

You can do this elegantly with Ray , which is a library for writing parallel and distributed Python.您可以使用Ray优雅地做到这一点,它是一个用于编写并行和分布式 Python 的库。

Simply declare the function_x with @ray.remote , and then it can be executed in parallel by invoking it with function_x.remote and the results can be retrieved with ray.get .只需使用@ray.remote声明function_x ,然后可以通过使用function_x.remote调用它来并行执行,并且可以使用ray.get检索结果。

import ray
import time

ray.init()

@ray.remote
def function_x(item):
    time.sleep(1)
    return item

def function_y(list):
    pass

list = [1, 2, 3, 4]

# Process the items in parallel.
results = ray.get([function_x.remote(item) for item in list])

function_y(list)

View the Ray documentation .查看Ray 文档

Here is the documentation for celery groups , which is what I think you want. Here is the documentation for celery groups ,这就是我认为你想要的。 Use AsyncResult.get() instead of AsyncResult.ready() to block.使用AsyncResult.get()而不是AsyncResult.ready()来阻止。

#!/bin/env python

import concurrent.futures

def function_x(item):
    return item * item


def function_y(lst):
    return [x * x for x in lst]


a_list = range(10)


if __name__ == '__main__':

    with concurrent.futures.ThreadPoolExecutor(10) as tp:

        future_to_function_x = {
            tp.submit(function_x, item): item
            for item in a_list
        }


    results = {}

    for future in concurrent.futures.as_completed(future_to_function_x):

        item = future_to_function_x[future]

        try:
            res = future.result()
        except Exception as e:
            print('Exception when processing item "%s": %s' % (item, e))
        else:
            results[item] = res


    print('results:', results)

    after = function_y(results.values())

    print('after:', after)

Output:输出:

results: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
after: [0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM