使用 Multiprocessing 更新類實例列表中的值

Question

我已經看到一些建議，我什至不應該使用多處理來做到這一點，或者這不是使用多處理（內存共享）的方式，但無論如何我都會問，看看我是否能更好地理解它。

我有一個 Person 類，我想使用update_age方法更新那個人的年齡。 假設我有數百萬個 Person 對象（而不是此處顯示的 4 個），並且我想使用 Multiprocessing 來更新我擁有的所有 Person 實例的年齡。

class Person():
    def __init__(self, age):
        self.age = age

    def update_age(self):
        self.age += 10

people = [Person(1), Person(2), Person(3), Person(4)]

p = Pool(4)

results = [p.apply_async(person.update_age, ()) for person in people]

print([res.get() for res in results])

這會給我[None, None, None, None]因為方法 Person.update_age 不返回任何內容。 如果我將課程更改為：

class Person():
    def __init__(self, age):
        self.age = age

    def update_age(self):
        return self.age + 10

我會得到正確的回應[11, 12, 13, 14] 。 但這需要我重構該類中的所有方法。 如果我在不考慮可擴展性的情況下開發Person類，那將是很多工作。 有什么方法可以讓我保持類的原始結構，並且仍然能夠使用多處理將工作負載分散到我所有的 CPU 上？

編輯：所以如果我在父進程中有這樣的方法。

def update_all():
  for person in people:
    person.update_age()

使用所有 CPU 更新所有“人”的正確方法是什么？

Answer 1

您可以將自定義函數傳遞給多處理池，而無需修改 Person 類。 但是，在當前情況下，您實際上並不需要多處理，因為您的update_age是非常簡單的（計算上）函數。

盡管如此，如果您的update_age函數需要例如 1 毫秒才能完成，則使用多處理將是合理的。

這是一個示例，我在更新時間上添加了一個小延遲，並引入了對每個人執行的外部函數：

import time
import copy
import random
import multiprocessing as mp
from multiprocessing.pool import Pool


class Person():
    def __init__(self, age):
        self.age = age

    def update_age(self):
        self.age += 10
        time.sleep(0.001)


def get_updated_age(person: Person) -> int:
    person.update_age()
    return person.age


if __name__ == "__main__":
    people = [Person(random.randint(0, 60)) for _ in range(int(1e4))]
    people_copy = copy.deepcopy(people)

    start = time.perf_counter()
    results_loop = []
    for person in people:
        person.update_age()
        results_loop.append(person.age)
    print(f'Apply with loop took {time.perf_counter() - start} seconds')
    
    p = Pool(mp.cpu_count())

    start = time.perf_counter()
    results_map = p.map(get_updated_age, people_copy)
    print(f'Apply with pool map took {time.perf_counter() - start} seconds')

    print(results_loop == results_map)

我已經創建了 10,000 個隨機年齡的人，並按順序（在循環中）和 using 多處理池應用了update_age 。 輸出如下：

Apply with loop took 13.693018896 seconds
Apply with pool map took 3.141904311000001 seconds
True

因此，您可以看到具有外部函數的多處理可用於更有效地執行 cpu 綁定操作。

否則，如果操作不是 CPU 密集型的（如您原來的 +10），由於進程創建和池操作成本，順序代碼將優於多處理。 如果我刪除time.sleep(0.001)上面代碼的輸出將是

Apply with loop took 0.003054251000000008 seconds
Apply with pool map took 0.01680525399999999 seconds
True

因此，在這種情況下，多處理是沒有用的。

使用 Multiprocessing 更新類實例列表中的值

問題描述

1 個解決方案

解決方案1
2 已采納 2020-11-23 12:21:10

使用 Multiprocessing 更新類實例列表中的值

問題描述

1 個解決方案

解決方案1 2 已采納 2020-11-23 12:21:10

解決方案1
2 已采納 2020-11-23 12:21:10