简体   繁体   English

在嵌套对象中使用 Python 多处理库

[英]Using Python multiprocessing library inside nested objects

I'm trying to use the multiprocessing library to parallelize some expensive calculations without blocking some others, much lighter.我正在尝试使用多处理库来并行化一些昂贵的计算,而不会阻塞其他一些计算,轻得多。 The both need to interact through some variables, although the may run with different paces.两者都需要通过一些变量进行交互,尽管可能以不同的速度运行。

To show this, I have created the following example, that works fine:为了说明这一点,我创建了以下示例,该示例工作正常:

import multiprocessing
import time
import numpy as np


class SumClass:

    def __init__(self):

        self.result = 0.0
        self.p = None
        self.return_value = None

    def expensive_function(self, new_number, return_value):

        # Execute expensive calculation
        #######
        time.sleep(np.random.random_integers(5, 10, 1))
        return_value.value = self.result + new_number
        #######

    def execute_function(self, new_number):

        print(' New number received: %f' % new_number)
        self.return_value = multiprocessing.Value("f", 0.0, lock=True)
        self.p = multiprocessing.Process(target=self.expensive_function, args=(new_number, self.return_value))
        self.p.start()

    def is_executing(self):

        if self.p is not None:

            if not self.p.is_alive():
                self.result = self.return_value.value
                self.p = None
                return False

            else:
                return True

        else:
            return False


if __name__ == '__main__':

    sum_obj = SumClass()
    current_value = 0

    while True:

        if not sum_obj.is_executing():

            # Randomly determine whether the function must be executed or not
            if np.random.rand() < 0.25:
                print('Current sum value: %f' % sum_obj.result)
                new_number = np.random.rand(1)[0]
                sum_obj.execute_function(new_number)

        # Execute other (light) stuff
        #######
        print('Executing other stuff')
        current_value += sum_obj.result * 0.1
        print('Current value: %f' % current_value)
        time.sleep(1)
        #######

Basically, in the main loop some light function is executed, and depending on a random condition, some heavy work is sent to another process if it has already finished the previous one, carried out by an object which needs to store some data between executions.基本上,在主循环中执行了一些轻量级函数,并且根据随机条件,如果前一个进程已经完成,一些繁重的工作会被发送到另一个进程,由需要在执行之间存储一些数据的对象执行。 Although expensive_function needs some time, the light function keeps on executing without being blocked.尽管costum_function需要一些时间,但是light函数会继续执行而不会被阻塞。

Although the above code gets the job done, I'm wondering: is it the best/most appropriate method to do this?虽然上面的代码完成了工作,但我想知道:这是最好/最合适的方法吗?

Besides, let us suppose the class SumClass has an instance of another object, which also needs to store data.此外,假设类 SumClass 有另一个对象的实例,该对象也需要存储数据。 For example:例如:

import multiprocessing
import time
import numpy as np


class Operator:

    def __init__(self):

        self.last_value = 1.0

    def operate(self, value):

        print('    Operation, last value: %f' % self.last_value)
        self.last_value *= value
        return self.last_value


class SumClass:

    def __init__(self):

        self.operator_obj = Operator()
        self.result = 0.0

        self.p = None
        self.return_value = None

    def expensive_function(self, new_number, return_value):

        # Execute expensive calculation
        #######
        time.sleep(np.random.random_integers(5, 10, 1))

        # Apply operation
        number = self.operator_obj.operate(new_number)

        # Apply other operation
        return_value.value = self.result + number
        #######

    def execute_function(self, new_number):

        print('    New number received: %f' % new_number)
        self.return_value = multiprocessing.Value("f", 0.0, lock=True)
        self.p = multiprocessing.Process(target=self.expensive_function, args=(new_number, self.return_value))
        self.p.start()

    def is_executing(self):
        if self.p is not None:
            if not self.p.is_alive():
                self.result = self.return_value.value
                self.p = None
                return False
            else:
                return True
        else:
            return False


if __name__ == '__main__':

    sum_obj = SumClass()
    current_value = 0

    while True:

        if not sum_obj.is_executing():

            # Randomly determine whether the function must be executed or not
            if np.random.rand() < 0.25:
                print('Current sum value: %f' % sum_obj.result)
                new_number = np.random.rand(1)[0]
                sum_obj.execute_function(new_number)

        # Execute other (light) stuff
        #######
        print('Executing other stuff')
        current_value += sum_obj.result * 0.1
        print('Current value: %f' % current_value)
        time.sleep(1)
        #######

Now, inside the expensive_function , a function member of the object Operator is used, which needs to store the number passed.现在,在cost_function内部,使用了对象Operator的函数成员,它需要存储传递的数字。

As expected, the member variable last_value does not change, ie it does not keep any value.正如预期的那样,成员变量last_value不会改变,即它不保留任何值。

Is there any way of doing this properly?有没有办法正确地做到这一点?

I can imagine I could arrange everything so that I only need to use one class level, and it would work well.我可以想象我可以安排一切,这样我只需要使用一个类级别,并且它会运行良好。 However, this is a toy example, in reality there are different levels of complex objects and it would be hard.然而,这是一个玩具示例,实际上有不同层次的复杂对象,这很难。

Thank you very much in advance!非常感谢您提前!

from concurrent.futures import ThreadPoolExecutor
from numba import jit
import requests
import timeit


def timer(number, repeat):
    def wrapper(func):
        runs = timeit.repeat(func, number=number, repeat=repeat)
        print(sum(runs) / len(runs))
    return wrapper


URL = "https://httpbin.org/uuid"

@jit(nopython=True, nogil=True,cache=True)
def fetch(session, url):
    with session.get(url) as response:
        print(response.json()['uuid'])


@timer(1, 1)
def runner():
    with ThreadPoolExecutor(max_workers=25) as executor:
        with requests.Session() as session:
            executor.map(fetch, [session] * 100, [URL] * 100)
            executor.shutdown(wait=True)
            executor._adjust_thread_count

Maybe this might help.也许这可能会有所帮助。

I'm using ThreadPoolExecutor for multithreading.我正在使用 ThreadPoolExecutor 进行多线程处理。 you can also use ProcessPoolExecutor.您还可以使用 ProcessPoolExecutor。

For your compute expensive operation you can use numba for making cached byte code of your function for faster exeution.对于计算成本高的操作,您可以使用 numba 来制作函数的缓存字节码,以便更快地执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM