简体   繁体   中英

Multiprocessing Pool not working - For loop inside function

I'm trying to get this function work asynchronously (I have tried asyncio, threadpoolexecutor, processpoolexecutor and still no luck). It takes around 11 seconds on my PC to complete a batch 500 items and there isno difference compared to plain for loop , so I assume It doesn't work as expected (in parallel).

here is the function:

from unidecode import unidecode
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

pool = ThreadPool(4)

def is_it_bad(word):
    for item in all_names:
        if str(word) in str(item['name']):
            return item
    item = {'name':word, 'gender': 2}
    return item

def check_word(arr):
    fname = unidecode(str(arr[1]['fullname'] + ' ' + arr[1]['username'])).replace('([^a-z ]+)', ' ').lower()
    fname = fname + ' ' + fname.replace(' ', '')
    fname = fname.split(' ')
    genders = []
    for chunk in fname:
        if len(chunk) > 2:
            genders.append(int(is_it_bad('_' + chunk + '_')['gender']))        
    if set(genders) == {2}:        
        followers[arr[0]]['gender'] = 2
        #results_new.append(name)
    elif set([0,1]).issubset(genders):
        followers[arr[0]]['gender'] = 2
        #results_new.append(name)
    else:
        if 0 in genders:
            followers[arr[0]]['gender'] = 0
            #results_new.append(name)
        else:
            followers[arr[0]]['gender'] = 1
            #results_new.append(name)

results = pool.map(check_word, [(idx, name) for idx, name in enumerate(names)]) 

Can you please help me with this

You are using the module "multiprocessing.dummy"

According to the documentation provided here ,

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

The threading module does not provide the same speedup advantages as the multiprocessing module does because the threads in that module are executed serially. For more information on how to use the multiprocessing module, visit this tutorial (no affiliation).

In it, the author uses both multiprocessing.dummy and multiprocessing to accomplish two different tasks. You'll notice multiprocessing is the module used to provide the speedup. Just switch to that module and you should see an increase.

I am unable to run your code due to the unidecode package, but here is how I use multithreading in my previous projects and with the with your code:

import multiprocessing
#get maximum threads
max_threads = multiprocessing.cpu_count()
#max_threads = multiprocessing.cpu_count()-1 #I prefer to use 1 less core if i still wish to use my device

#create pool with max_threads
p = multiprocessing.Pool(max_threads)
#execute pool with function
results = p.map(check_word, [(idx, name) for idx, name in enumerate(names)]) 

Let me know if this works or helps!

Edit: Added some comments to the code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM