简体   繁体   中英

Pickle in multiprocessing

The goal of following code is merely to fill array a with 0 to 9:

import multiprocessing
import numpy as np
from joblib import Parallel, delayed

num_cores = multiprocessing.cpu_count()

inputs = range(10)
a = np.zeros(10)

def processInput():
    def testNested(t):
        a[t]= t
    Parallel(n_jobs=num_cores, backend="threading")(delayed(testNested)(t) for t in range(0, 10))

processInput()

I get the pickle error when I'm trying to call multiprocess in a function:

AttributeError: Can't pickle local object 'processInput.<locals>.testNested'

Question: Any suggestion how to achieve this goal, in a case I have to operate multiprocess within other functions?

按照错误消息, 如文档所述,嵌套函数不能被腌制,您应该在模块的顶层定义工作函数。

There are examples shown in built-in multiprocessing package, I slightly modified first one, which includes using Pool class:

class multiprocessing.pool.Pool([processes[, initializer[, initargs[, maxtasksperchild[, context]]]]])

A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.

processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.

If initializer is not None then each worker process will call initializer(*initargs) when it starts.

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None , which means worker processes will live as long as the pool.

context can be used to specify the context used for starting the worker processes. Usually a pool is created using the function multiprocessing.Pool() or the Pool() method of a context object. In both cases context is set appropriately.

Note that the methods of the pool object should only be called by the process which created the pool.

Therefore you don't really need to import and use joblib , because Pool uses the number of CPUs in the system by default:

from multiprocessing import Pool

def f(x):
    return x

if __name__ == '__main__':
    with Pool() as p:
        print(p.map(f, list(range(10))))

however same result could be achieved by writing "one-liner":

print(list(range(10)))

I mapped pool p to function f() in case you want to do something more complex than just assigning values to index.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM