简体   繁体   中英

Python Multiprocessing - number of process

I am executing the below code and its working fine, but its not spawning to different process, instead sometimes all are running in a same process and sometimes 2 in one one process kind of. And i am using a 4 cpu machine. whats wrong with this code?

def f(values):
    print(multiprocessing.current_process())
    return values

def main():
    p = Pool(4) #number of processes = number of CPUs
    keys, values= zip(*data.items()) #ordered keys and values
    processed_values= p.map( f, values )
    result= dict( zip(keys, processed_values ) ) 
    p.close() # no more tasks
    p.join()  # wrap up current tasks

And the result is

<SpawnProcess(SpawnPoolWorker-1, started daemon)>
<SpawnProcess(SpawnPoolWorker-1, started daemon)>
<SpawnProcess(SpawnPoolWorker-1, started daemon)>
<SpawnProcess(SpawnPoolWorker-1, started daemon)>

And sometimes like this,

<SpawnProcess(SpawnPoolWorker-3, started daemon)>
<SpawnProcess(SpawnPoolWorker-2, started daemon)>
<SpawnProcess(SpawnPoolWorker-1, started daemon)>
<SpawnProcess(SpawnPoolWorker-3, started daemon)>

Sometimes,

<SpawnProcess(SpawnPoolWorker-1, started daemon)>
<SpawnProcess(SpawnPoolWorker-4, started daemon)>
<SpawnProcess(SpawnPoolWorker-2, started daemon)>
<SpawnProcess(SpawnPoolWorker-1, started daemon)>

And my question is, on what basis it assigns the function to the workers? I am writing code in a way that it decides the number of processes based on the number of keys in my dictionary(considering that my data will always have less keys than my CPUs). My code will start like - Main code reads the file and make a dictionary out of it using single process and should branch it to number of concurrent process and wait for them to process the data(i am using pool.map for that), then once it get the result of the child processes, it starts processing them. How can i achieve this parent wait for child process step?

There's nothing wrong with your code. Your work item is just very fast - so fast, that it's possible for the same worker process to run the function, return the result, and then win the race to consume the next task from the internal queue that multiprocessing.Pool uses to distribute work. When you call map , the work items are broken into batches and placed into a Queue . Here's part of the implementation of pool.map that chunks up the items in iterable and puts them in the queue:

    task_batches = Pool._get_tasks(func, iterable, chunksize)
    result = MapResult(self._cache, chunksize, len(iterable), callback)
    self._taskqueue.put((((result._job, i, mapstar, (x,), {}) 
                          for i, x in enumerate(task_batches)), None))

Each worker process runs a function that has an infinite while loop that consumes items from that queue*:

while maxtasks is None or (maxtasks and completed < maxtasks):
    try:
        task = get()  # Pulls an item off the taskqueue
    except (EOFError, IOError):
        debug('worker got EOFError or IOError -- exiting')
        break

    if task is None:
        debug('worker got sentinel -- exiting')
        break

    job, i, func, args, kwds = task
    try:
        result = (True, func(*args, **kwds))  # Runs the function you passed to map
    except Exception, e:
        result = (False, e)
    try:
        put((job, i, result))  # Sends the result back to the parent
    except Exception as e:
        wrapped = MaybeEncodingError(e, result[1])
        debug("Possible encoding error while sending result: %s" % (
            wrapped))

It's likely that the same worker has just by chance been able to consume an item, run func , and then consume the next item. This is somewhat strange - I can't reproduce it on my machine running the same code as your example - but having the same worker grab two of four items from the queue is pretty normal.

You should always see even distribution if you make your worker function take longer, by inserting a call to time.sleep :

def f(values):
    print(multiprocessing.current_process())
    time.sleep(1)
    return values

* This is actually not quite true - there's a thread that runs in the main process that consumes from taskqueue , and then sticks what it pulls out into another Queue , and that's what the child processes consume from)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM