I'm working on a project that needs to run two different CPU-intensive functions. Hence using a multiproccessing approach seems to be the way to go. The challenge that I'm facing is that one function has a slower runtime than the other one. For the sake of argument lets say that execute
has a runtime of .1 seconds while update
takes a full second to run. The goal is that while update
is running execute
will have calculated an output value 10 times. Once update
has finished it needs to pass a set of parameters to execute
which can then continue generating an output with the new set of parameters. After sometime update
needs to run again and once more generate a new set of parameters.
Furthermore both functions will require a different set of input variables.
The image link below should hopefully visualize my conundrum a bit better.
function runtime visualisation
From what I've gathered ( https://zetcode.com/python/multiprocessing/ ) using an asymetric mapping approach might be the way to go, but it doesn't really seem to work. Any help is greatly appreciated.
Pseudo Code
from multiprocessing import Pool
from datetime import datetime
import time
import numpy as np
class MyClass():
def __init__(self, inital_parameter_1, inital_parameter_2):
self.parameter_1 = inital_parameter_1
self.parameter_2 = inital_parameter_2
def execute(self, input_1, input_2, time_in):
print('starting execute function for time:' + str(time_in))
time.sleep(0.1) # wait for 100 milliseconds
# generate some output
output = (self.parameter_1 * input_1) + (self.parameter_2 + input_2)
print('exiting execute function')
return output
def update(self, update_input_1, update_input_2, time_in):
print('starting update function for time:' + str(time_in))
time.sleep(1) # wait for 1 second
# generate parameters
self.parameter_1 += update_input_1
self.parameter_2 += update_input_2
print('exiting update function')
def smap(f):
return f()
if __name__ == "__main__":
update_input_1 = 3
update_input_2 = 4
input_1 = 0
input_2 = 1
# initialize class
my_class = MyClass(1, 2)
# total runtime (arbitrary)
runtime = int(10e6)
# update_time (arbitrary)
update_time = np.array([10, 10e2, 15e4, 20e5])
for current_time in range(runtime):
# if time equals update time run both functions simultanously until update is complete
if any(update_time == current_time):
with Pool() as pool:
res = pool.map_async(my_class.smap, [my_class.execute(input_1, input_2, current_time),
my_class.update(update_input_1, update_input_2, current_time)])
# otherwise run only execute
else:
output = my_class.execute(input_1, input_2,current_time)
# increment input
input_1 += 1
input_2 += 2
I confess to not being able to fully following your code vis-a-vis your description. But I see some issues:
update
is not returning any value other than None
, which is implicitly returned due to the lack of a return
statement.with Pool() ...:
block will call terminate
upon block exit, which is immediately after your call to pool.map_async
, which is non-blocking. But you have no provision to wait for the completion of this submitted task ( terminate
will most likely kill the running task before it completes).map_async
function is the worker function name and an iterable . But you are invoking method calls to execute
and update
from the current main process and using their return values as elements of the iterable and these return values are definitely not functions suitable for passing to smap
. So there is no multiprocessing being done and this is just plain wrong. I would therefore recommend the following changes at the very least. But note that this code potentially generates tasks much faster than they can be completed and you could have millions of tasks queued up to run given your current runtime
value, which could be quite a strain on system resources such as memory. So I've inserted some code that ensures that the rate of submitting tasks is throttled so that the number of incomplete submitted tasks is never more than three times the number of CPU cores available.
# we won't need heavy-duty numpy for what we are doing:
#import numpy as np
from multiprocessing import cpu_count
from threading import Lock
... # etc.
if __name__ == "__main__":
update_input_1 = 3
update_input_2 = 4
input_1 = 0
input_2 = 1
# initialize class
my_class = MyClass(1, 2)
# total runtime (arbitrary)
runtime = int(10e6)
# update_time (arbitrary)
# we don't need overhead of numpy (remove import of numpy):
#update_time = np.array([10, 10e2, 15e4, 20e5])
update_time = [10, 10e2, 15e4, 20e5]
tasks_submitted = 0
lock = Lock()
execute_output = []
def execute_result(result):
global tasks_submitted
with lock:
tasks_submitted -= 1
# result is the return value from method execute
# do something with it, e.g. execute_output.append(result)
pass
update_output = []
def update_result(result):
global tasks_submitted
with lock:
tasks_submitted -= 1
# result is the return value from method update
# do something with it, e.g. update_output.append(result)
pass
n_processors = cpu_count()
with Pool() as pool:
for current_time in range(runtime):
# if time equals update time run both functions simultanously until update is complete
#if any(update_time == current_time):
if current_time in update_time:
# run both update and execute:
pool.apply_async(my_class.update, args=(update_input_1, update_input_2, current_time), callback=update_result)
with lock:
tasks_submitted += 1
pool.apply_async(my_class.execute, args=(input_1, input_2, current_time), callback=execute_result)
with lock:
tasks_submitted += 1
# increment input
input_1 += 1
input_2 += 2
while tasks_submitted > n_processors * 3:
time.sleep(.05)
# Ensure all tasks have completed:
pool.close()
pool.join()
assert(tasks_submitted == 0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.