简体   繁体   English

如何在 Python 中实现多处理?

[英]How to implement multiprocessing in Python?

I want to use multiprocessing in Python to sort independent lists.我想在 Python 中使用多处理来对独立列表进行排序。
For example, I have a dictionary of an int as a key and a list as a value.例如,我有一个 int 作为键和一个列表作为值的字典。

I tried to implement a simple program, but I have a difficulty to store the sorted list in a defaultdict again and return it to the main module.我试图实现一个简单的程序,但是我很难将排序列表再次存储在 defaultdict 中并将其返回到主模块。

from multiprocessing import Process

def fun(id, user_data):
    user_data.sort()
    return user_data

# users_data is a defaultdict of id as key and a list as a value
 
if __name__ == '__main__':
    for id,user_data in users_data.items():
        P= Process(target=fun,args=(id,user_data))
        P.start()
        P.join()    

You'll need to use Manager to share data between processes.您需要使用 Manager 在进程之间共享数据。
Also, as @Tomerikoo mentioned in the comments, the way you are doing it right now will not actually result in multiprocessing as P.join() just after P.start() will the script pause to let that process finish, thus resulting in a serial execution flow rather than parallel.此外,正如@Tomerikoo 在评论中提到的那样,您现在执行的方式实际上不会导致多处理,因为P.join()P.start()之后脚本会暂停以让该过程完成,从而导致串行执行流程而不是并行。

You can do something like this:你可以这样做:

from multiprocessing import Process, Manager

def sort_list(user_id, user_data, interprocess_dict):
    user_data.sort()
    interprocess_dict[user_id] = user_data


users_data = {}
users_data[1] = [5, 2, 1]
users_data[3] = [10, 12, 1]


def main():
    interprocess_dict = Manager().dict()
    processes = []
    for user_id, user_data in users_data.items():
        proc = Process(target=sort_list, args=(user_id, user_data, interprocess_dict,))
        processes.append(proc)
        proc.start()

    for proc in processes:
        proc.join()
    
    for user_id, user_data in interprocess_dict.items():
        print('{}: {}'.format(user_id, user_data))


if __name__ == '__main__':
    main()

EDIT:编辑:

Its better to limit the number of processes to the number of hardware CPU units available as sorting a list is 100% CPU bound operation.最好将进程数限制为可用的硬件 CPU 单元数,因为对列表进行排序是 100% CPU 绑定操作。

import multiprocessing as mp


def sort_list(user_id, user_data, interprocess_dict):
    user_data.sort()
    interprocess_dict[user_id] = user_data


def prepare_data():
    users_data = {}
    for i in range(1000):
        users_data[i] = list(range(10000, 0, -1))
    return users_data


def main():
    # mp.set_start_method('spawn') # Only valid on OSX
    interprocess_dict = mp.Manager().dict()
    pool = mp.Pool(mp.cpu_count())
    users_data = prepare_data()
    for user_id, user_data in users_data.items():
        pool.apply_async(sort_list, args = (user_id, user_data, interprocess_dict,))
    pool.close()
    pool.join()
    for user_id, user_data in interprocess_dict.items():
        print('{}: {}'.format(user_id, user_data))


if __name__ == '__main__':
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM