如何通过合并 function 在 Python 中实现多线程/多处理

Question

The dictionary and code is below which is working字典和代码在下面，它正在工作

The question is regarding mainly on multithreading, I know that below code can be easily rewrite to easy way.问题主要是关于多线程，我知道下面的代码可以很容易地重写为简单的方法。 But the ask is on multithreading, I just created one working example to test但问题是关于多线程，我刚刚创建了一个工作示例来测试
total(todos) is the main function total(todos) 是主要的 function
user_count, title_count, complete_count are independent of each other user_count、title_count、complete_count 相互独立
I need to implement multithreading/multiprocessing has to implement我需要实现多线程/多处理必须实现
def total(todos): is the place where need to do multithreading def total(todos): 是需要做多线程的地方

todos = [{'userId': 1, 'id': 1, 'title': 'A', 'completed': False},
     {'userId': 1, 'id': 2, 'title': 'B ', 'completed': False},
     {'userId': 1, 'id': 1, 'title': 'C', 'completed': False},
     {'userId': 1, 'id': 2, 'title': 'A', 'completed': True},
     {'userId': 2, 'id': 1,'title': 'B', 'completed': False}]
def total(todos):
    ###### Multithreading need to implement ##########
    user_count = userid(todos)
    title_count = title(todos)
    complete_count = completed(todos)
    search_count_all = {**user_count, **title_count, **complete_count}
    return search_count_all
def userid(todos):    
    for d in todos:
        for l, m in d.items():  
            super_dict.setdefault(l, []).append(m)
    d = {k:len(set(v)) for k,v in super_dict.items()}
    return {"userid":d['userId']}
def title(todos):    
    for d in todos:
        for l, m in d.items():  
            super_dict.setdefault(l, []).append(m)
    d = {k:len(set(v)) for k,v in super_dict.items()}
    return {"title":d['title']}
def completed(todos):    
    for d in todos:
        for l, m in d.items():  
            super_dict.setdefault(l, []).append(m)
    d = {k:len(set(v)) for k,v in super_dict.items()}
    return {"completed":d['completed']}

total(todos)

Current output and expected output当前 output 和预期 output

{'userid': 2, 'title': 4, 'completed': 2}

can we do multprocessing also我们也可以进行多重处理吗

from joblib import Parallel, delayed from joblib import 并行，延迟

Answer 1

As mandulaj answer specifies, you can use the library as follows.正如 mandulaj 回答所指定的，您可以按如下方式使用该库。 I simplified the example, so it's easier to reason.我简化了这个例子，所以更容易推理。

I wrote 3 different functions, where each just multiply the input value by a different number.我写了 3 个不同的函数，每个函数只是将输入值乘以不同的数字。 Then using the ThreadPoolExecutor I submitted their execution, concurrently.然后使用ThreadPoolExecutor我同时提交了他们的执行。

You then need to collect the results of the each execution.然后，您需要收集每次执行的结果。 This is done with the as_completed method from the library.这是通过库中的as_completed方法完成的。

from concurrent.futures import ThreadPoolExecutor, as_completed


def func1(i):
    return i


def func2(i):
    return i*2


def func3(i):
    return i*3


funcs_to_execute = [func1, func2, func3]
arg = 1
res = []

with ThreadPoolExecutor(max_workers=2) as executor:
    futures = {executor.submit(f, arg) for f in funcs_to_execute}

    for fut in as_completed(futures):
        res.append(fut.result())

print(res)

>[2, 1, 3]

To apply it for your use-case, the function you submit is userid, title, completed with the arg todo .要将其应用于您的用例，您提交的 function 是userid, title, completed以 arg todo完成。 The results of all of them resides in the res list container.所有这些的结果都驻留在res列表容器中。

However, there are few issues with your code when you enable concurrency.但是，当您启用并发时，您的代码几乎没有问题。 It's currently working for you, because it runs sequentially.它目前正在为您工作，因为它是按顺序运行的。 The superdict is shared, and when accessed concurrently, is not guaranteed. superdict是共享的，当同时访问时，不能保证。 You can fix this by creating a new list in every function, return it as a result, and at the end, aggregate.您可以通过在每个 function 中创建一个新列表来解决此问题，将其作为结果返回，最后进行聚合。 So something like this所以像这样

def userid(todos):
    res = list()
    for d in todos:
        for l, m in d.items():
            if l == 'userid':
                res.append(m)

    return {"userid": res}

and the merge the results from futures (which are, dictionaries) into a single superdict .并将futures （即字典）的结果合并到一个单一的superdict中。

Answer 2

Check out pythons concurrent.futures library.查看 pythons concurrent.futures库。 You can run your function as a thread with您可以将 function 作为线程运行

import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as executor:
    future1 = executor.submit(userid, todos)
    future2 = executor.submit(title, todos)
    # ...

    user_count = future1.result()
    title_count = future2.result()

Answer 3

As I mentioned in my comment to your question, you really should be using multiprocessing rather than multithreading since your tasks are CPU-intensive.正如我在对您的问题的评论中提到的那样，您确实应该使用多处理而不是多线程，因为您的任务是 CPU 密集型的。 The problem then becomes sharing a global super_dict dictionary since, unlike with multithreading, each subprocess has its own memory space and super_dict must now be located in shared memory.然后问题就变成了共享全局super_dict字典，因为与多线程不同，每个子进程都有自己的 memory 空间，并且super_dict现在必须位于共享的 memory 中。 The way to go about initializing this is to use a dict instance returned by the multiprocessing.SyncManager and to then initialize each subprocess with a proxy to this managed dictionary when the process pool is created. go 关于初始化的方法是使用由multiprocessing.SyncManager返回的dict实例，然后在创建进程池时使用此托管字典的代理初始化每个子进程。 This method will work for both Windows and Unix-based platforms.此方法适用于 Windows 和基于 Unix 的平台。 This version of the dictionary has some limitations.这个版本的字典有一些限制。 It does not seem to support the setdefault method and append method the way you would want it to in that the proxy does not seem to realize that the dictionary has actually been updated.它似乎不支持setdefault方法和append方法，因为代理似乎没有意识到字典实际上已更新。 So, a few adjustments have to be made.因此，必须进行一些调整。

But there is a great inefficiency in your code.但是您的代码效率非常低。 Each of your functions userId , title and completed perform the same redundant processing of the todos list against the super_dict dictionary adding multiple occurrences of user ids, titles, etc. This doesn't ultimately matter because you are counting the number of unique values.您的每个函数userId 、 title和completed super_dict todos执行相同的冗余处理，添加多次出现的用户 ID、标题等。这最终无关紧要，因为您正在计算唯一值的数量。 But it is wasteful of CPU cycles to do this redundantly and in the multiprocessing scenario you now have the problem of handling the parallel updating of the dictionary that buys you nothing.但是，冗余地执行此操作会浪费 CPU 周期，并且在多处理场景中，您现在遇到了处理字典的并行更新的问题，这对您没有任何好处。 The following code therefore just initializes super_dict in the main process.因此，以下代码仅在主进程中初始化super_dict 。 And your functions userId , title and completed are doing more calculations than they need to, which can be further simplified (see below).并且您的函数userId 、 title和completed所做的计算超出了他们的需要，这可以进一步简化（见下文）。

When one is done improving the efficiency of the code, it does not pay to actually even bother with multiprocessing for this particular example since the overhead of creating the pool and proxy dictionary is greater than any savings from executing these three trivial functions in parallel.当一个人完成了提高代码效率的工作时，对于这个特定的示例，实际上甚至不必为多处理而烦恼，因为创建池和代理字典的开销大于并行执行这三个微不足道的函数所节省的任何费用。 But this, at least, shows you the technique for a real example.但这至少向您展示了一个真实示例的技术。

from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager


def init_pool(d):
    global super_dict
    super_dict = d

def total(todos):
    # Call to Manager() returns a SyncManger instance:
    with Manager() as manager:
        d = manager.dict()
        init_dict(d)
        with ProcessPoolExecutor(max_workers=3, initializer=init_pool, initargs=(d,)) as executor:
            f1 = executor.submit(userid)
            f2 = executor.submit(title)
            f3 = executor.submit(completed)
            user_count = f1.result()
            title_count = f2.result()
            complete_count = f3.result()
            search_count_all = {**user_count, **title_count, **complete_count}
            return search_count_all

def init_dict(the_dict):
    for d in todos:
        for l, m in d.items():
            # Cannot use setdefault:
            if l in the_dict:
                # Use this instead of append:
                the_dict[l] += [m]
            else:
                the_dict[l] = [m]

def userid():
    return {'userId': len({v for v in super_dict['userId']})}

def title():
    return {'title': len({v for v in super_dict['title']})}

def completed():
    return {'completed': len({v for v in super_dict['completed']})}

if __name__ == '__main__':
    todos = [{'userId': 1, 'id': 1, 'title': 'A', 'completed': False},
         {'userId': 1, 'id': 2, 'title': 'B ', 'completed': False},
         {'userId': 1, 'id': 1, 'title': 'C', 'completed': False},
         {'userId': 1, 'id': 2, 'title': 'A', 'completed': True},
         {'userId': 2, 'id': 1,'title': 'B', 'completed': False}]
    print(total(todos))

如何通过合并 function 在 Python 中实现多线程/多处理

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-01-09 07:45:47

解决方案2
0 2021-01-06 04:51:09

解决方案3
0 2021-01-12 14:00:59

如何通过合并 function 在 Python 中实现多线程/多处理

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-01-09 07:45:47

解决方案2 0 2021-01-06 04:51:09

解决方案3 0 2021-01-12 14:00:59

解决方案1
1 已采纳 2021-01-09 07:45:47

解决方案2
0 2021-01-06 04:51:09

解决方案3
0 2021-01-12 14:00:59