简体   繁体   English

如何在Python中使用multiprocessing.pool创建全局锁/信号量?

[英]How to create global lock/semaphore with multiprocessing.pool in Python?

I want limit resource access in children processes. 我想在子进程中限制资源访问。 For example - limit http downloads , disk io , etc.. How can I achieve it expanding this basic code? 例如 - 限制http下载磁盘io等。我如何实现它扩展这个基本代码?

Please share some basic code examples. 请分享一些基本的代码示例。

pool = multiprocessing.Pool(multiprocessing.cpu_count())
while job_queue.is_jobs_for_processing():
  for job in job_queue.pull_jobs_for_processing:
    pool.apply_async(do_job, callback = callback)
pool.close()
pool.join()

Use the initializer and initargs arguments when creating a pool so as to define a global in all the child processes. 创建池时使用initializer和initargs参数,以便在所有子进程中定义全局。

For instance: 例如:

from multiprocessing import Pool, Lock
from time import sleep

def do_job(i):
    "The greater i is, the shorter the function waits before returning."
    with lock:
        sleep(1-(i/10.))
        return i

def init_child(lock_):
    global lock
    lock = lock_

def main():
    lock = Lock()
    poolsize = 4
    with Pool(poolsize, initializer=init_child, initargs=(lock,)) as pool:
        results = pool.imap_unordered(do_job, range(poolsize))
        print(list(results))

if __name__ == "__main__":
    main()

This code will print out the numbers 0-3 in ascending order (the order in which the jobs were submitted), because it uses the lock. 此代码将按升序(作业提交的顺序)打印出数字0-3,因为它使用锁定。 Comment out the with lock: line to see it print out the numbers in descending order. 注释掉with lock:行以查看它按降序打印出数字。

This solution works both on windows and unix. 此解决方案适用于Windows和unix。 However, because processes can fork on unix systems, unix only need to declare global variables at the module scope. 但是,因为进程可以在unix系统上进行分叉,所以unix只需要在模块范围内声明全局变量。 The child process gets a copy of the parent's memory, which includes the lock object which still works. 子进程获取父进程的内存副本,其中包含仍然有效的锁对象。 Thus the initializer isn't strictly needed, but it can help document how the code is intended to work. 因此,并不严格需要初始化程序,但它可以帮助记录代码的工作方式。 When multiprocessing is able to create processes by forking, then the following also works. 当多处理能够通过分叉创建进程时,以下也可以。

from multiprocessing import Pool, Lock
from time import sleep

lock = Lock()

def do_job(i):
    "The greater i is, the shorter the function waits before returning."
    with lock:
        sleep(1-(i/10.))
        return i

def main():
    poolsize = 4
    with Pool(poolsize) as pool:
        results = pool.imap_unordered(do_job, range(poolsize))
        print(list(results))

if __name__ == "__main__":
    main()

Use a global semaphore and aquire it if you are accessing a resource. 如果要访问资源,请使用全局信号量并获取它。 For example: 例如:

import multiprocessing
from time import sleep

semaphore = multiprocessing.Semaphore(2)

def do_job(id):
    with semaphore:
        sleep(1)
    print("Finished job")

def main():
    pool = multiprocessing.Pool(6)
    for job_id in range(6):
        print("Starting job")
        pool.apply_async(do_job, [job_id])
    pool.close()
    pool.join()

if __name__ == "__main__":
    main()

This program finishes only two jobs every second because the other threads are waiting for the semaphore. 该程序每秒只完成两个作业,因为其他线程正在等待信号量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM