简体   繁体   中英

What is the difference between ‘spawn’ start method and default MacOS start method in multiprocessing in Python?

I am using Python 3.7.5. I have this code. It works on my macbook.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
with mp.Pool(processes=mp.cpu_count()) as p:
    func = partial(doc_sentiment_computation_en, analyser=SentimentIntensityAnalyzer())
    documents_with_scores = p.map(func, all_responses)

However, this did not work in the production server and I suspected it was because the default start method is fork() in the linux machine in the server. So I tried using the 'spawn' context. However, it does not work in my mac. Isn't what I had by default on macOS was 'spawn' context anyway?

ctx = mp.get_context('spawn')
with ctx.Pool(processes=ctx.cpu_count()) as p:
    func = partial(doc_sentiment_computation_en, analyser=SentimentIntensityAnalyzer())
    documents_with_scores = p.map(func, all_responses) 

I am new to multiprocessing, so please be kind with me.

As @Booboo pointed out, spawn became dafault on macOS only in Python3.8, so it was not the case. I was later able to test on a local linux machine, and it worked there too. So it looks like something to do with the server. I was able to solve this by adding p.close() and p.join(). I don't know the reason. I thought 'with' statement would itself ensures proper acquisition and release of resource.

with mp.Pool(processes=mp.cpu_count()) as p:
        func = partial(find_set_intersection, set_2=search_keywords_lemmas)
        common_lemmas = p.map(func, all_responses_lemmas)
        p.close()
        p.join()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM