简体   繁体   English

正确实现Python多处理

[英]Correctly implementing Python Multiprocessing

I was going through the multiprocessing tutorial here: http://pymotw.com/2/multiprocessing/basics.html 我在这里经历了多处理教程: http//pymotw.com/2/multiprocessing/basics.html

I wrote the script below as an exercise. 我在下面写了一个脚本作为练习。 The script appears to be working and I do see 5 new python processes running in taskmgr. 该脚本似乎正在工作,我确实看到在taskmgr中运行了5个新的python进程。 However, my print statement outputs the same folder being searched multiple times. 但是,我的print语句输出多次搜索的同一文件夹。

I have a suspicion that instead of splitting up the work between different processes, I'm giving each process the entire work load. 我怀疑不是在不同进程之间拆分工作,而是让每个进程都承担整个工作负载。 I'm pretty certain I'm doing something wrong and inefficiently. 我很确定我做错了什么,效率低下。 Could someone please point out my errors? 有人可以指出我的错误吗?

What I have so far: 到目前为止我所拥有的:

def email_finder(msg_id):
    for folder in os.listdir(sample_path):
        print "Searching through folder: ", folder
        folder_path = sample_path + '\\' + folder
        for file in os.listdir(os.listdir(folder_path)):
            if file.endswith('.eml'):
                file_path = folder_path + '\\' + file
                email_obj = email.message_from_file(open(file_path))
                if msg_id in email_obj.as_string().lower()
                    shutil.copy(file_path, tmp_path + '\\' + file)
                    return 'Found: ', file_path
    else:
        return 'Not Found!'

def worker():
    msg_ids = cur.execute("select msg_id from my_table").fetchall()
    for i in msg_ids:
        msg_id = i[0].encode('ascii')
        if msg_id != '':
            email_finder(msg_id)
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

Each of your subprocesses gets its own cursor, and therefor iterates over the entire set of IDs. 每个子进程都有自己的游标,因此迭代整个ID。

You need to read the msg_ids from your DB once, and then spawn it to sub-processes, instead of letting each sub-process query on its own. 您需要从数据库中读取一次msg_ids ,然后将其生成到子进程,而不是让每个子进程自行查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM