简体   繁体   中英

Python multiprocessing using a lock or manager list for Pool workers accessing a global list variable

I am trying to distribute jobs over several CUDA devices where the total number of running jobs at any time should be less than or equal to the number of cpu cores available. To do this, I determine the number of available 'slots' on each device and create a list that holds the available slots. If I have 6 cpu cores, and two cuda devices (0 and 1), then AVAILABLE_SLOTS = [0, 1, 0, 1, 0, 1]. In my worker function I pop the list and save it to a variable, set CUDA_VISIBLE_DEVICES env var in the subprocess call, and then append it back to the list. This has been working so far but I want to avoid race conditions.

Current code is as follows:

def work(cmd):
    slot = AVAILABLE_GPU_SLOTS.pop()
    exit_code = subprocess.call(cmd, shell=False, env=dict(os.environ, CUDA_VISIBLE_DEVICES=str(slot)))
    AVAILABLE_GPU_SLOTS.append(slot)
    return exit_code

if __name__ == '__main__':
    pool_size = multiprocessing.cpu_count()
    mols_to_be_run = [name for name in os.listdir(YANK_FILES) if os.path.isdir(os.path.join(YANK_FILES, name))]
    cmds = build_cmd(mols_to_be_run)
    cuda = get_cuda_devices()
    AVAILABLE_GPU_SLOTS = build_available_gpu_slots(pool_size, cuda)
    pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=2, )
    pool.map(work, cmds)

Can I simply declare lock = multiprocessing.Lock() at the same level as AVAILABLE_GPU_SLOTS, put it in cmds, and then inside work() do

with lock:
    slot = AVAILABLE_GPU_SLOTS.pop()
# subprocess stuff
with lock:
    AVAILABLE_GPU_SLOTS.append(slot)

or do I need a manager list. Alternatively maybe there's a better solution to what I'm doing.

Basing off of what I found in the following SO answer Python sharing a lock between processes :

Using a regular list leads to each process having its own copy, as is expected. Using a manager list seems to be sufficient enough to get around that. Example code:

def doing_work(honk):
    proc = multiprocessing.current_process()
    # with lock:
    #     print proc, 'about to pop SLOTS_LIST', SLOTS_LIST
    #     slot = SLOTS_LIST.pop()
    #     print multiprocessing.current_process(), ' just popped', slot, 'from', SLOTS_LIST
    print proc, 'about to pop SLOTS_LIST', SLOTS_LIST
    slot = SLOTS_LIST.pop()
    print multiprocessing.current_process(), ' just popped', slot, 'from SLOTS_LIST'
    time.sleep(10)

def init(l):
    global lock
    lock = l

if __name__ == '__main__':
    man = multiprocessing.Manager()
    SLOTS_LIST = [1,34,3465,456,4675,6,4]
    SLOTS_LIST = man.list(SLOTS_LIST)
    l = multiprocessing.Lock()
    pool = multiprocessing.Pool(processes=2, initializer=init, initargs=(l,))
    inputs = range(len(SLOTS_LIST))
    pool.map(doing_work, inputs)

which outputs

<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675, 6, 4]
<Process(PoolWorker-3, started daemon)>  just popped 4 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675, 6]
<Process(PoolWorker-2, started daemon)>  just popped 6 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456, 4675]
<Process(PoolWorker-3, started daemon)>  just popped 4675 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34, 3465, 456]
<Process(PoolWorker-2, started daemon)>  just popped 456 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1, 34, 3465]    
<Process(PoolWorker-3, started daemon)>  just popped 3465 from SLOTS_LIST
<Process(PoolWorker-2, started daemon)> about to pop SLOTS_LIST [1, 34]
<Process(PoolWorker-2, started daemon)>  just popped 34 from SLOTS_LIST
<Process(PoolWorker-3, started daemon)> about to pop SLOTS_LIST [1]
<Process(PoolWorker-3, started daemon)>  just popped 1 from SLOTS_LIST

which is desired behavior. I'm not sure if it completely eliminates race conditions but it seems to be good enough. That and using a lock on top of it is simple enough.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM