如何通過合並 function 在 Python 中實現多線程/多處理

Question

字典和代碼在下面，它正在工作

問題主要是關於多線程，我知道下面的代碼可以很容易地重寫為簡單的方法。 但問題是關於多線程，我剛剛創建了一個工作示例來測試
total(todos) 是主要的 function
user_count、title_count、complete_count 相互獨立
我需要實現多線程/多處理必須實現
def total(todos): 是需要做多線程的地方

todos = [{'userId': 1, 'id': 1, 'title': 'A', 'completed': False},
     {'userId': 1, 'id': 2, 'title': 'B ', 'completed': False},
     {'userId': 1, 'id': 1, 'title': 'C', 'completed': False},
     {'userId': 1, 'id': 2, 'title': 'A', 'completed': True},
     {'userId': 2, 'id': 1,'title': 'B', 'completed': False}]
def total(todos):
    ###### Multithreading need to implement ##########
    user_count = userid(todos)
    title_count = title(todos)
    complete_count = completed(todos)
    search_count_all = {**user_count, **title_count, **complete_count}
    return search_count_all
def userid(todos):    
    for d in todos:
        for l, m in d.items():  
            super_dict.setdefault(l, []).append(m)
    d = {k:len(set(v)) for k,v in super_dict.items()}
    return {"userid":d['userId']}
def title(todos):    
    for d in todos:
        for l, m in d.items():  
            super_dict.setdefault(l, []).append(m)
    d = {k:len(set(v)) for k,v in super_dict.items()}
    return {"title":d['title']}
def completed(todos):    
    for d in todos:
        for l, m in d.items():  
            super_dict.setdefault(l, []).append(m)
    d = {k:len(set(v)) for k,v in super_dict.items()}
    return {"completed":d['completed']}

total(todos)

當前 output 和預期 output

{'userid': 2, 'title': 4, 'completed': 2}

我們也可以進行多重處理嗎

from joblib import 並行，延遲

Answer 1

正如 mandulaj 回答所指定的，您可以按如下方式使用該庫。 我簡化了這個例子，所以更容易推理。

我寫了 3 個不同的函數，每個函數只是將輸入值乘以不同的數字。 然后使用ThreadPoolExecutor我同時提交了他們的執行。

然后，您需要收集每次執行的結果。 這是通過庫中的as_completed方法完成的。

from concurrent.futures import ThreadPoolExecutor, as_completed


def func1(i):
    return i


def func2(i):
    return i*2


def func3(i):
    return i*3


funcs_to_execute = [func1, func2, func3]
arg = 1
res = []

with ThreadPoolExecutor(max_workers=2) as executor:
    futures = {executor.submit(f, arg) for f in funcs_to_execute}

    for fut in as_completed(futures):
        res.append(fut.result())

print(res)

>[2, 1, 3]

要將其應用於您的用例，您提交的 function 是userid, title, completed以 arg todo完成。 所有這些的結果都駐留在res列表容器中。

但是，當您啟用並發時，您的代碼幾乎沒有問題。 它目前正在為您工作，因為它是按順序運行的。 superdict是共享的，當同時訪問時，不能保證。 您可以通過在每個 function 中創建一個新列表來解決此問題，將其作為結果返回，最后進行聚合。 所以像這樣

def userid(todos):
    res = list()
    for d in todos:
        for l, m in d.items():
            if l == 'userid':
                res.append(m)

    return {"userid": res}

並將futures （即字典）的結果合並到一個單一的superdict中。

Answer 2

查看 pythons concurrent.futures庫。 您可以將 function 作為線程運行

import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as executor:
    future1 = executor.submit(userid, todos)
    future2 = executor.submit(title, todos)
    # ...

    user_count = future1.result()
    title_count = future2.result()

Answer 3

正如我在對您的問題的評論中提到的那樣，您確實應該使用多處理而不是多線程，因為您的任務是 CPU 密集型的。 然后問題就變成了共享全局super_dict字典，因為與多線程不同，每個子進程都有自己的 memory 空間，並且super_dict現在必須位於共享的 memory 中。 go 關於初始化的方法是使用由multiprocessing.SyncManager返回的dict實例，然后在創建進程池時使用此托管字典的代理初始化每個子進程。 此方法適用於 Windows 和基於 Unix 的平台。 這個版本的字典有一些限制。 它似乎不支持setdefault方法和append方法，因為代理似乎沒有意識到字典實際上已更新。 因此，必須進行一些調整。

但是您的代碼效率非常低。 您的每個函數userId 、 title和completed super_dict todos執行相同的冗余處理，添加多次出現的用戶 ID、標題等。這最終無關緊要，因為您正在計算唯一值的數量。 但是，冗余地執行此操作會浪費 CPU 周期，並且在多處理場景中，您現在遇到了處理字典的並行更新的問題，這對您沒有任何好處。 因此，以下代碼僅在主進程中初始化super_dict 。 並且您的函數userId 、 title和completed所做的計算超出了他們的需要，這可以進一步簡化（見下文）。

當一個人完成了提高代碼效率的工作時，對於這個特定的示例，實際上甚至不必為多處理而煩惱，因為創建池和代理字典的開銷大於並行執行這三個微不足道的函數所節省的任何費用。 但這至少向您展示了一個真實示例的技術。

from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager


def init_pool(d):
    global super_dict
    super_dict = d

def total(todos):
    # Call to Manager() returns a SyncManger instance:
    with Manager() as manager:
        d = manager.dict()
        init_dict(d)
        with ProcessPoolExecutor(max_workers=3, initializer=init_pool, initargs=(d,)) as executor:
            f1 = executor.submit(userid)
            f2 = executor.submit(title)
            f3 = executor.submit(completed)
            user_count = f1.result()
            title_count = f2.result()
            complete_count = f3.result()
            search_count_all = {**user_count, **title_count, **complete_count}
            return search_count_all

def init_dict(the_dict):
    for d in todos:
        for l, m in d.items():
            # Cannot use setdefault:
            if l in the_dict:
                # Use this instead of append:
                the_dict[l] += [m]
            else:
                the_dict[l] = [m]

def userid():
    return {'userId': len({v for v in super_dict['userId']})}

def title():
    return {'title': len({v for v in super_dict['title']})}

def completed():
    return {'completed': len({v for v in super_dict['completed']})}

if __name__ == '__main__':
    todos = [{'userId': 1, 'id': 1, 'title': 'A', 'completed': False},
         {'userId': 1, 'id': 2, 'title': 'B ', 'completed': False},
         {'userId': 1, 'id': 1, 'title': 'C', 'completed': False},
         {'userId': 1, 'id': 2, 'title': 'A', 'completed': True},
         {'userId': 2, 'id': 1,'title': 'B', 'completed': False}]
    print(total(todos))

如何通過合並 function 在 Python 中實現多線程/多處理

問題描述

3 個解決方案

解決方案1
1 已采納 2021-01-09 07:45:47

解決方案2
0 2021-01-06 04:51:09

解決方案3
0 2021-01-12 14:00:59

如何通過合並 function 在 Python 中實現多線程/多處理

問題描述

3 個解決方案

解決方案1 1 已采納 2021-01-09 07:45:47

解決方案2 0 2021-01-06 04:51:09

解決方案3 0 2021-01-12 14:00:59

解決方案1
1 已采納 2021-01-09 07:45:47

解決方案2
0 2021-01-06 04:51:09

解決方案3
0 2021-01-12 14:00:59