简体   繁体   中英

Python multiprocessing with shared library and memory persistance

I am using Python as a driver to run thousands of numerical simulations. Since the initialization procedure for each simulation is the same, I can save time by doing the (time-consuming) initialisation procedure once, backup in memory the initial state of the vectors and for each subsequent simulation simply restore those vectors.

A small working example is:

from ctypes import cdll

try:
    lib = cdll.LoadLibrary('shared_lib.so')
    print lib.set_to_backup() # this tells the program to save the initial state
    print lib.simulate("cmd.txt") # initializes, backs up state, saves an internal variable -backed_up- to true and simulates
    print lib.simulate("cmd2.txt") # sees the internal state -backed_up- equal to true and skips initialisation, restores from memory and simulates
except:
    import traceback
    traceback.print_exc()

This works perfectly and I can run several simulations (cmd1, cmd2, ...) without reinitializing.

Now, I want to parallelize this procedure using multiprocessing. So, each process should load the library once and run the first simulation with initializing, saving and simulating. Each subsequent simulation should reload the initial state and simulate. The example for one process:

from ctypes import cdll
from multiprocessing import Process

try:
    lib = cdll.LoadLibrary('shared_lib.so')
    print lib.set_to_backup() # this tells the program to save the initial state
    p1 = Process(target=lib.simulate, args=("cmd.txt",)) # initializes, backs up state, saves an internal variable -backed_up- to true and simulates
    p1.start()
    p1.join()
    print p1.exitcode

    p2 = Process(target=lib.simulate, args=("cmd2.txt",)) # (should) see the internal state -backed_up- equal to true and skips initialisation, restores from memory and simulates
    p2.start()
    p2.join()
    print p2.exitcode
except:
    import traceback
    traceback.print_exc()

The first process does the job correctly (I can see it in the trace). The second process doesn't see the -backed_up- variable in lib and re-initialises everything.

I tried without declaring a new process, but simply reruning p1.start() to restart the same process but it fails (assert self._popen is None, 'cannot start a process twice').

-backed_up- is a global variable in lib and should remain in memory between calls to lib.simulate (as it does in the first example).

I run Linux Debian 7 and use python 2.7.3.

Anyone has an idea how to make this work please?

I managed to get it to work using a queue. Answer heavily inspired by --> https://stackoverflow.com/a/6672593/801468

import multiprocessing
from ctypes import cdll

num_procs = 2

def worker():
    lib = cdll.LoadLibrary('shared_lib.so')
    print lib.set_to_backup()
    for DST in iter( q.get, None ):
        print 'treating: ', DST
        print lib.simulate(DST)
        q.task_done()
    q.task_done()

q = multiprocessing.JoinableQueue()
procs = []
for i in range(num_procs):
    procs.append( multiprocessing.Process(target=worker) )
    procs[-1].daemon = True
    procs[-1].start()

list_of_DST = ["cmd1.txt", "cmd2.txt"]
for DST in list_of_DST:
    q.put(DST)

q.join()

for p in procs:
    q.put( None )

q.join()

for p in procs:
    p.join()

print "Finished everything...."
print "Active children:", multiprocessing.active_children()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM