简体   繁体   English

对于访问全局列表变量的Pool worker,使用锁或管理器列表进行Python多处理

[英]Python multiprocessing using a lock or manager list for Pool workers accessing a global list variable

I am trying to distribute jobs over several CUDA devices where the total number of running jobs at any time should be less than or equal to the number of cpu cores available. 我试图在多个CUDA设备上分配作业,其中任何时候运行的作业总数应小于或等于可用的cpu核心数。 To do this, I determine the number of available 'slots' on each device and create a list that holds the available slots. 为此,我确定每个设备上可用“插槽”的数量,并创建一个包含可用插槽的列表。 If I have 6 cpu cores, and two cuda devices (0 and 1), then AVAILABLE_SLOTS = [0, 1, 0, 1, 0, 1]. 如果我有6个cpu核心和两个cuda设备(0和1),那么AVAILABLE_SLOTS = [0,1,0,1,0,1]。 In my worker function I pop the list and save it to a variable, set CUDA_VISIBLE_DEVICES env var in the subprocess call, and then append it back to the list. 在我的worker函数中,我弹出列表并将其保存到变量中,在子进程调用中设置CUDA_VISIBLE_DEVICES env var,然后将其追加到列表中。 This has been working so far but I want to avoid race conditions. 到目前为止,这一直有效,但我想避免竞争条件。

Current code is as follows: 目前的代码如下:

def work(cmd):
    slot = AVAILABLE_GPU_SLOTS.pop()
    exit_code = subprocess.call(cmd, shell=False, env=dict(os.environ, CUDA_VISIBLE_DEVICES=str(slot)))
    AVAILABLE_GPU_SLOTS.append(slot)
    return exit_code

if __name__ == '__main__':
    pool_size = multiprocessing.cpu_count()
    mols_to_be_run = [name for name in os.listdir(YANK_FILES) if os.path.isdir(os.path.join(YANK_FILES, name))]
    cmds = build_cmd(mols_to_be_run)
    cuda = get_cuda_devices()
    AVAILABLE_GPU_SLOTS = build_available_gpu_slots(pool_size, cuda)
    pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=2, )
    pool.map(work, cmds)

Can I simply declare lock = multiprocessing.Lock() at the same level as AVAILABLE_GPU_SLOTS, put it in cmds, and then inside work() do 我可以简单地声明lock = multiprocessing.Lock()与AVAILABLE_GPU_SLOTS处于同一级别,将其放在cmds中,然后在work()内部

with lock:
    slot = AVAILABLE_GPU_SLOTS.pop()
# subprocess stuff
with lock:
    AVAILABLE_GPU_SLOTS.append(slot)

or do I need a manager list. 还是我需要一个经理名单。 Alternatively maybe there's a better solution to what I'm doing. 或者也许对我正在做的事情有一个更好的解决方案。

Basing off of what I found in the following SO answer Python sharing a lock between processes : 基于我在下面的SO回答中找到的答案Python在进程之间共享锁定

Using a regular list leads to each process having its own copy, as is expected. 正如预期的那样,使用常规列表会导致每个进程都有自己的副本。 Using a manager list seems to be sufficient enough to get around that. 使用经理列表似乎足以解决这个问题。 Example code: 示例代码:

def doing_work(honk):
    proc = multiprocessing.current_process()
    # with lock:
    #     print proc, 'about to pop SLOTS_LIST', SLOTS_LIST
    #     slot = SLOTS_LIST.pop()
    #     print multiprocessing.current_process(), ' just popped', slot, 'from', SLOTS_LIST
    print proc, 'about to pop SLOTS_LIST', SLOTS_LIST
    slot = SLOTS_LIST.pop()
    print multiprocessing.current_process(), ' just popped', slot, 'from SLOTS_LIST'
    time.sleep(10)

def init(l):
    global lock
    lock = l

if __name__ == '__main__':
    man = multiprocessing.Manager()
    SLOTS_LIST = [1,34,3465,456,4675,6,4]
    SLOTS_LIST = man.list(SLOTS_LIST)
    l = multiprocessing.Lock()
    pool = multiprocessing.Pool(processes=2, initializer=init, initargs=(l,))
    inputs = range(len(SLOTS_LIST))
    pool.map(doing_work, inputs)

which outputs 哪个输出

<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675, 6, 4]
<Process(PoolWorker-3, started daemon)>  just popped 4 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675, 6]
<Process(PoolWorker-2, started daemon)>  just popped 6 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675]
<Process(PoolWorker-3, started daemon)>  just popped 4675 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456]
<Process(PoolWorker-2, started daemon)>  just popped 456 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465]    
<Process(PoolWorker-3, started daemon)>  just popped 3465 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34]
<Process(PoolWorker-2, started daemon)>  just popped 34 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1]
<Process(PoolWorker-3, started daemon)>  just popped 1 from SLOTS_LIST

which is desired behavior. 这是理想的行为。 I'm not sure if it completely eliminates race conditions but it seems to be good enough. 我不确定它是否完全消除了竞争条件,但它似乎已经足够好了。 That and using a lock on top of it is simple enough. 那并且在它上面使用锁是很简单的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM