简体   繁体   中英

Python multiprocessing.Pool does not start right away

I want to input text to python and process it in parallel. For that purpose I use multiprocessing.Pool . The problem is that sometime, not always, I have to input text multiple times before anything is processed.

This is a minimal version of my code to reproduce the problem:

import multiprocessing as mp
import time

def do_something(text):
    print('Out: ' + text, flush=True)
    # do some awesome stuff here

if __name__ == '__main__':
    p = None
    while True:
        message = input('In: ')
        if not p:
            p = mp.Pool()
        p.apply_async(do_something, (message,))

What happens is that I have to input text multiple times before I get a result, no matter how long I wait after I have inputted something the first time. (As stated above, that does not happen every time.)

python3 test.py
In: a
In: a
In: a
In: Out: a
Out: a
Out: a

If I create the pool before the while loop or if I add time.sleep(1) after creating the pool, it seems to work every time. Note: I do not want to create the pool before I get an input.

Has someone an explanation for this behavior?

I'm running Windows 10 with Python 3.4.2 EDIT: Same behavior with Python 3.5.1


EDIT:

An even simpler example with Pool and also ProcessPoolExecutor. I think the problem is the call to input() right after appyling/submitting, which only seems to be a problem the first time appyling/submitting something.

import concurrent.futures
import multiprocessing as mp
import time

def do_something(text):
    print('Out: ' + text, flush=True)
    # do some awesome stuff here

# ProcessPoolExecutor
# if __name__ == '__main__':
#     with concurrent.futures.ProcessPoolExecutor() as executor:
#         executor.submit(do_something, 'a')
#         input('In:')
#         print('done')

# Pool
if __name__ == '__main__':
    p = mp.Pool()
    p.apply_async(do_something, ('a',))
    input('In:')
    p.close()
    p.join()
    print('done')

Your code works when I tried it on my Mac.

In Python 3, it might help to explicitly declare how many processors will be in your pool (ie the number of simultaneous processes).

try using p = mp.Pool(1)

import multiprocessing as mp
import time

def do_something(text):
    print('Out: ' + text, flush=True)
    # do some awesome stuff here

if __name__ == '__main__':
    p = None
    while True:
        message = input('In: ')
        if not p:
            p = mp.Pool(1)
        p.apply_async(do_something, (message,))

I could not reproduce it on Windows 7 but there are few long shots worth to mention for your issue.

  1. your AV might be interfering with the newly spawned processes, try temporarily disabling it and see if the issue is still present.
  2. Win 10 might have different IO caching algorithm, try inputting larger strings. If it works, it means that the OS tries to be smart and sends data when a certain amount has piled up.
  3. As Windows has no fork() primitive, you might see the delay caused by the spawn starting method.
  4. Python 3 added a new pool of workers called ProcessPoolExecutor , I'd recommend you to use this no matter the issue you suffer from.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM