简体   繁体   中英

Correctly implementing Python Multiprocessing

I was going through the multiprocessing tutorial here: http://pymotw.com/2/multiprocessing/basics.html

I wrote the script below as an exercise. The script appears to be working and I do see 5 new python processes running in taskmgr. However, my print statement outputs the same folder being searched multiple times.

I have a suspicion that instead of splitting up the work between different processes, I'm giving each process the entire work load. I'm pretty certain I'm doing something wrong and inefficiently. Could someone please point out my errors?

What I have so far:

def email_finder(msg_id):
    for folder in os.listdir(sample_path):
        print "Searching through folder: ", folder
        folder_path = sample_path + '\\' + folder
        for file in os.listdir(os.listdir(folder_path)):
            if file.endswith('.eml'):
                file_path = folder_path + '\\' + file
                email_obj = email.message_from_file(open(file_path))
                if msg_id in email_obj.as_string().lower()
                    shutil.copy(file_path, tmp_path + '\\' + file)
                    return 'Found: ', file_path
    else:
        return 'Not Found!'

def worker():
    msg_ids = cur.execute("select msg_id from my_table").fetchall()
    for i in msg_ids:
        msg_id = i[0].encode('ascii')
        if msg_id != '':
            email_finder(msg_id)
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

Each of your subprocesses gets its own cursor, and therefor iterates over the entire set of IDs.

You need to read the msg_ids from your DB once, and then spawn it to sub-processes, instead of letting each sub-process query on its own.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM