简体   繁体   English

如何在python中使用多处理

[英]How to use multiprocessing in python

New to python and I want to do parallel programming in the following code, and want to use multiprocessing in python to do it. 我是python的新手,我想在下面的代码中进行并行编程,并希望在python中使用多处理来完成它。 So how to modify the code? 那么如何修改代码呢? I've been searching method by using Pool, but found limited examples that I can follow. 我一直在使用Pool搜索方法,但发现了我可以遵循的有限示例。 Anyone can help me? 有人可以帮帮我吗? Thank you. 谢谢。

Note that setinner and setouter are two independent functions and that's where I want to use parallel programming to reduce the running time. 请注意,setinner和setouter是两个独立的函数,我想使用并行编程来减少运行时间。

def solve(Q,G,n):
    i = 0
    tol = 10**-4

    while i < 1000:

        inneropt,partition,x = setinner(Q,G,n)
        outeropt = setouter(Q,G,n)

        if (outeropt - inneropt)/(1 + abs(outeropt) + abs(inneropt)) < tol:
            break

        node1 = partition[0]
        node2 = partition[1]

        G = updateGraph(G,node1,node2)
        if i == 999:
            print "Maximum iteration reaches"
    print inneropt

It's hard to parallelize code that needs to mutate the same shared data from different tasks. 很难并行化需要改变来自不同任务的相同共享数据的代码。 So, I'm going to assume that setinner and setouter are non-mutating functions; 所以,我将假设setinnersetouter是非变异函数; if that's not true, things will be more complicated. 如果那不是真的,情况会更复杂。

The first step is to decide what you want to do in parallel. 第一步是决定你想要并行做什么。


One obvious thing is to do the setinner and setouter at the same time. 一个显而易见的事情是同时做setinnersetouter They're completely independent of each other, and always need to both get done. 他们完全相互独立,总是需要完成。 So, that's what I'll do. 所以,这就是我要做的。 Instead of doing this: 而不是这样做:

inneropt,partition,x = setinner(Q,G,n)
outeropt = setouter(Q,G,n)

… we want to submit the two functions as tasks to the pool, then wait for both to be done, then get the results of both. ...我们希望将两个函数作为任务提交给池,然后等待两者完成,然后获取两者的结果。

The concurrent.futures module (which requires a third-party backport in Python 2.x) makes it easier to do things like "wait for both to be done" than the multiprocessing module (which is in the stdlib in 2.6+), but in this case, we don't need anything fancy; concurrent.futures模块(需要Python 2.x中的第三方反向端口)使得比multiprocessing模块(在2.6+中的stdlib中)更容易做“等待两者都完成”之类的事情,但是在这种情况下,我们不需要任何花哨的东西; if one of them finishes early, we don't have anything to do until the other finishes anyway. 如果其中一个提前完成,我们无论如何都要做,直到另一个完成。 So, let's stick with multiprocessing.apply_async : 所以,让我们坚持使用multiprocessing.apply_async

pool = multiprocessing.Pool(2) # we never have more than 2 tasks to run
while i < 1000:
    # parallelly start both tasks
    inner_result = pool.apply_async(setinner, (Q, G, n))
    outer_result = pool.apply_async(setouter, (Q, G, n))

    # sequentially wait for both tasks to finish and get their results
    inneropt,partition,x = inner_result.get()
    outeropt = outer_result.get()

    # the rest of your loop is unchanged

You may want to move the pool outside the function so it lives forever and can be used by other parts of your code. 您可能希望将池移动到函数外部,以便它永远存在,并且可以由代码的其他部分使用。 And if not, you almost certainly want to shut the pool down at the end of the function. 如果没有,你几乎肯定想在功能结束时关闭池。 (Later versions of multiprocessing let you just use the pool in a with statement, but I think that requires Python 3.2+, so you have to do it explicitly.) multiprocessing更高版本允许您在with语句中使用池,但我认为这需要Python 3.2+,因此您必须明确地执行它。)


What if you want to do more work in parallel? 如果您想要并行完成更多工作怎么办? Well, there's nothing else obvious to do here without restructuring the loop. 好吧,没有重组循环,没有其他明显的事情要做。 You can't do updateGraph until you get the results back from setinner and setouter , and nothing else is slow here. 在从setinnersetouter返回结果之前,你不能执行updateGraph ,这里没有别的东西。

But if you could reorganize things so that each loop's setinner were independent of everything that came before (which may or may not be possible with your algorithm—without knowing what you're doing, I can't guess), you could push 2000 tasks onto the queue up front, then loop by just grabbing results as needed. 但是,如果你可以重新组织事物,以便每个循环的setinner独立于之前的所有内容(使用你的算法可能或不可能 - 不知道你在做什么,我无法猜测),你可以推动2000任务在前面排队,然后根据需要抓取结果循环。 For example: 例如:

pool = multiprocessing.Pool() # let it default to the number of cores
inner_results = []
outer_results = []
for _ in range(1000):
    inner_results.append(pool.apply_async(setinner, (Q,G,n,i))
    outer_results.append(pool.apply_async(setouter, (Q,G,n,i))
while i < 1000:
    inneropt,partition,x = inner_results.pop(0).get()
    outeropt = outer_results.pop(0).get()
    # result of your loop is the same as before

Of course you can make this fancier. 当然,你可以做这个发烧友。

For example, let's say you rarely need more than a couple hundred iterations, so it's wasteful to always compute 1000 of them. 例如,假设您很少需要超过几百次迭代,因此总是计算1000次迭代是浪费的。 You can just push the first N at startup, and push one more every time through the loop (or N more every N times) so you never do more than N wasted iterations—you can't get an ideal tradeoff between perfect parallelism and minimal waste, but you can usually tune it pretty nicely. 你可以在启动时按下第一个N,然后每次循环推一个N(或者每N次更多N次),这样你就不会做多于N浪费的迭代 - 你无法在完美的并行性和最小的并行性之间取得理想的权衡浪费,但你通常可以很好地调整它。

Also, if the tasks don't actually take that long, but you have a lot of them, you may want to batch them up. 此外,如果任务实际上没有那么长时间,但你有很多,你可能想要批量处理它们。 One really easy way to do this is to use one of the map variants instead of apply_async ; 一个非常简单的方法是使用其中一个map变体而不是apply_async ; this can make your fetching code a tiny bit more complicated, but it makes the queuing and batching code completely trivial (eg, to map each func over a list of 100 parameters with a chunksize of 10 is just two simple lines of code). 这可以使你的代码取一点点更复杂,但它使排队和配料代码完全微不足道的(例如, map每个func超过100个参数用一个列表chunksize的10只是两个简单的代码行)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM