简体   繁体   English

有没有办法在嵌套函数或模块中使用 multiprocessing.pool?

[英]Is there any way to use multiprocessing.pool within a nested function or module?

thanks for taking a look at this.感谢您查看此内容。 I confess I have been dabbling with parallel processing in python for all of 1 week now so I apologize if there is an obvious solution I missed.我承认我已经在 python 中进行了 1 周的并行处理,所以如果我错过了一个明显的解决方案,我深表歉意。 I have a piece of code that I would like to run several different instances of of a mp.pool().我有一段代码,我想运行 mp.pool() 的几个不同实例。 Those that were on the main .py file called worked fine but when I tried to add them to functions in modules I get no output from them all.那些在主 .py 文件上的调用工作正常,但是当我尝试将它们添加到模块中的函数时,我没有从它们中得到任何输出。 The app just runs past it and continues.该应用程序只是运行通过它并继续。 I am thinking it may have something to do with this post but it didn't give any ideas on alternative methods to accomplish what I need.我认为它可能与这篇文章有关,但它没有给出关于完成我需要的替代方法的任何想法。 The code that works in a simple example is this:在一个简单示例中工作的代码是这样的:

import multiprocessing as mp
def multiproc_log_result(retval):
    results.append(retval)
    if len(results) % (10 // 10) == 0:
        print('{0}% done'.format(100 * len(results) / 10))

def meat():
    print 'beef'
    status = True
    return status
results = []
pool = mp.Pool(thread_count)
for x in range(10):
    pool.apply_async(meat, callback=multiproc_log_result)
pool.close()
pool.join()


def veggie():
    print 'carrot'
    status = True
    return status

results = []
pool = mp.Pool(thread_count)
for x in range(10):
    pool.apply_async(veggie, callback=multiproc_log_result)
pool.close()
pool.join()

And the code that doesn't work is:不起作用的代码是:

import multiprocessing as mp
def multiproc_log_result(retval):
    results.append(retval)
    if len(results) % (10 // 10) == 0:
        print('{0}% done'.format(100 * len(results) / 10))

def meat():
    print 'beef'
    status = True
    return status
results = []
pool = mp.Pool(thread_count)
for x in range(10):
    pool.apply_async(meat, callback=multiproc_log_result)
pool.close()
pool.join()

def nested_stupid_fn():
    def multiproc_log_result(retval):
        results.append(retval)
        if len(results) % (10 // 10) == 0:
            print('{0}% done'.format(100 * len(results) / 10))

    def veggie():
        print 'carrot'
        status = True
        return status

    results = []
    pool = mp.Pool(thread_count)
    for x in range(10):
        pool.apply_async(veggie, callback=multiproc_log_result)
    pool.close()
    pool.join()
nested_stupid_fn()

Ultimately I would like that example that doesn't work to be one more step removed by having it live in another function in a separate module.最终,我希望那个不起作用的例子通过将它存在于单独模块中的另一个函数中而被删除。 So that when I import the module packngo and use it as packngo.basic_packngo(inputs) and has the contents of the nest function somewhere within it they would run.因此,当我导入模块 packngo 并将其用作 packngo.basic_packngo(inputs) 并在其中某处包含 nest 函数的内容时,它们将运行。 Any help would be greatly appreciated.任何帮助将不胜感激。 :DI am a very simple man so if you could explain as you would to a child maybe then it will sink in my head! :DI 是一个非常简单的人,所以如果你能像对孩子一样解释,也许它会沉入我的脑海!

The other question you linked has the solution, it's just not spelled out: You cannot use nested functions as the func argument for the apply* / *map* family of methods on multiprocessing.Pool .您链接的另一个问题有解决方案,只是没有说明:您不能使用嵌套函数作为multiprocessing.Pool上的apply* / *map*系列方法的func参数。 They work for multiprocessing.dummy.Pool , because multiprocessing.dummy is backed by threads which can directly pass around function references, but multiprocessing.Pool must pickle the functions, and only functions with importable names can be pickled.它们适用于multiprocessing.dummy.Pool ,因为multiprocessing.dummy由可以直接传递函数引用的线程支持,但multiprocessing.Pool必须pickle 函数,并且只能pickle 具有可导入名称的函数。 If you check the name of a nested function, it's something like modulename.outerfuncname.<locals>.innerfuncname , and that <locals> component makes it impossible to import (which is usually a good thing; nested functions that make use of being nested usually have critical state in closure scope, which mere importing would lose).如果您检查嵌套函数的名称,它类似于modulename.outerfuncname.<locals>.innerfuncname ,并且<locals>组件使其无法导入(这通常是一件好事;利用嵌套的嵌套函数通常在关闭范围内具有临界状态,仅导入就会丢失)。

It's perfectly fine for the callback functions to be defined in a nested fashion, as they're executed in the parent process, they aren't sent to the workers.以嵌套方式定义callback函数非常好,因为它们是在父进程中执行的,它们不会发送给工作进程。 In your case, only the callback is relying on closure scope, so it's perfectly fine to move the func ( veggie ) out to global scope, defining your packngo module as:在您的情况下,只有回调依赖于闭包范围,因此将func ( veggie ) 移出全局范围是完全没问题的,将您的packngo模块定义为:

def veggie():
    print 'carrot'
    status = True
    return status

def nested_stupid_fn():
    def multiproc_log_result(retval):
        results.append(retval)
        if len(results) % (10 // 10) == 0:
            print('{0}% done'.format(100 * len(results) / 10))

    results = []
    pool = mp.Pool(thread_count)
    for x in range(10):
        pool.apply_async(veggie, callback=multiproc_log_result)
    pool.close()
    pool.join()
nested_stupid_fn()

Yes, it means veggie becomes a public member of the module in question.是的,这意味着veggie成为相关模块的公共成员。 You can prefix it with an underscore ( _veggie ) if you want to indicate it should be considered an implementation detail, but it must necessarily be global to use it with multiprocessing.Pool .如果你想表明它应该被视为一个实现细节,你可以用下划线 ( _veggie ) 作为前缀,但它必须是全局的才能与multiprocessing.Pool一起使用。

Well I think the issue is that inside the scope of multiproc_log_result the variable results doesn't exist.好吧,我认为问题是在multiproc_log_result的范围内,变量results不存在。 So what you should do is append to results directly the result of your async call.因此,您应该做的是将异步调用的结果直接附加到结果中。 You won't be able to track the progress though (no way to directly share a global variable for a callback function outside a class I guess)但是,您将无法跟踪进度(我猜无法在类外直接共享回调函数的全局变量)

from multiprocessing.pool import ThreadPool

def nested_stupid_fn():
    def multiproc_log_result(retval):
        results.append(retval)

    def veggie():
        print 'carrot'
        status = True
        return status

    results = []
    pool = ThreadPool(thread_count)
    for x in range(10):
        results.append(pool.apply_async(veggie))

    pool.close()
    pool.join()

    results = [result.get() for result in results]  # get value from async result

    ...then do stuff with results

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM