简体   繁体   中英

Python: Running multiple functions simultaneously with different execution times

I'm working on a project that needs to run two different CPU-intensive functions. Hence using a multiproccessing approach seems to be the way to go. The challenge that I'm facing is that one function has a slower runtime than the other one. For the sake of argument lets say that execute has a runtime of .1 seconds while update takes a full second to run. The goal is that while update is running execute will have calculated an output value 10 times. Once update has finished it needs to pass a set of parameters to execute which can then continue generating an output with the new set of parameters. After sometime update needs to run again and once more generate a new set of parameters.

Furthermore both functions will require a different set of input variables.

The image link below should hopefully visualize my conundrum a bit better.

function runtime visualisation

From what I've gathered ( https://zetcode.com/python/multiprocessing/ ) using an asymetric mapping approach might be the way to go, but it doesn't really seem to work. Any help is greatly appreciated.

Pseudo Code

from multiprocessing import Pool
from datetime import datetime
import time
import numpy as np


class MyClass():
    def __init__(self, inital_parameter_1, inital_parameter_2):
        self.parameter_1 = inital_parameter_1
        self.parameter_2 = inital_parameter_2

    def execute(self, input_1, input_2, time_in):
        print('starting execute function for time:' + str(time_in))
        time.sleep(0.1)  # wait for 100 milliseconds
        # generate some output
        output = (self.parameter_1 * input_1) + (self.parameter_2 + input_2)
        print('exiting execute function')
        return output

    def update(self, update_input_1, update_input_2, time_in):
        print('starting update function for time:' + str(time_in))
        time.sleep(1)  # wait for 1 second
        # generate parameters
        self.parameter_1 += update_input_1
        self.parameter_2 += update_input_2
        print('exiting update function')

    def smap(f):
        return f()


if __name__ == "__main__":
    update_input_1 = 3
    update_input_2 = 4
    input_1 = 0
    input_2 = 1
    # initialize class
    my_class = MyClass(1, 2)

    # total runtime (arbitrary)
    runtime = int(10e6)
    # update_time (arbitrary)
    update_time = np.array([10, 10e2, 15e4, 20e5])

    for current_time in range(runtime):
        # if time equals update time run both functions simultanously until update is complete
        if any(update_time == current_time):
            with Pool() as pool:
                res = pool.map_async(my_class.smap, [my_class.execute(input_1, input_2, current_time),
                                                     my_class.update(update_input_1, update_input_2, current_time)])
        # otherwise run only execute
        else:
            output = my_class.execute(input_1, input_2,current_time)
        
        # increment input 
        input_1 += 1
        input_2 += 2

I confess to not being able to fully following your code vis-a-vis your description. But I see some issues:

  1. Method update is not returning any value other than None , which is implicitly returned due to the lack of a return statement.
  2. Your with Pool() ...: block will call terminate upon block exit, which is immediately after your call to pool.map_async , which is non-blocking. But you have no provision to wait for the completion of this submitted task ( terminate will most likely kill the running task before it completes).
  3. What you are passing to the map_async function is the worker function name and an iterable . But you are invoking method calls to execute and update from the current main process and using their return values as elements of the iterable and these return values are definitely not functions suitable for passing to smap . So there is no multiprocessing being done and this is just plain wrong.
  4. You are also creating and destroying process pools over and over again. Much better to create the process pool just once.

I would therefore recommend the following changes at the very least. But note that this code potentially generates tasks much faster than they can be completed and you could have millions of tasks queued up to run given your current runtime value, which could be quite a strain on system resources such as memory. So I've inserted some code that ensures that the rate of submitting tasks is throttled so that the number of incomplete submitted tasks is never more than three times the number of CPU cores available.

# we won't need heavy-duty numpy for what we are doing:
#import numpy as np
from multiprocessing import cpu_count
from threading import Lock
... # etc.

if __name__ == "__main__":
    update_input_1 = 3
    update_input_2 = 4
    input_1 = 0
    input_2 = 1
    # initialize class
    my_class = MyClass(1, 2)

    # total runtime (arbitrary)
    runtime = int(10e6)
    # update_time (arbitrary)
    # we don't need overhead of numpy (remove import of numpy):
    #update_time = np.array([10, 10e2, 15e4, 20e5])
    update_time = [10, 10e2, 15e4, 20e5]

    tasks_submitted = 0
    lock = Lock()

    execute_output = []
    def execute_result(result):
        global tasks_submitted

        with lock:
            tasks_submitted -= 1
        # result is the return value from method execute
        # do something with it, e.g. execute_output.append(result)
        pass

    update_output = []
    def update_result(result):
        global tasks_submitted

        with lock:
            tasks_submitted -= 1
        # result is the return value from method update
        # do something with it, e.g. update_output.append(result)
        pass

    n_processors = cpu_count()
    with Pool() as pool:
        for current_time in range(runtime):
            # if time equals update time run both functions simultanously until update is complete
            #if any(update_time == current_time):
            if current_time in update_time:
                # run both update and execute:
                pool.apply_async(my_class.update, args=(update_input_1, update_input_2, current_time), callback=update_result)
                with lock:
                    tasks_submitted += 1
            pool.apply_async(my_class.execute, args=(input_1, input_2, current_time), callback=execute_result)
            with lock:
                tasks_submitted += 1

            # increment input
            input_1 += 1
            input_2 += 2
            while tasks_submitted > n_processors * 3:
                time.sleep(.05)
        # Ensure all tasks have completed:
        pool.close()
        pool.join()
        assert(tasks_submitted == 0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM