简体   繁体   English

并行化受CPU约束的Python函数

[英]Parallelizing a CPU-bound Python function

I have a CPU-bound Python function that takes around 15 seconds to run on a standard core. 我有一个受CPU约束的Python函数,在标准内核上运行大约需要15秒。 I need to run this function tens of thousands of times. 我需要运行此功能数万次。 The function input is a dataset around 10kB in size, so data transfer time should be negligible compared to the runtime. 函数输入的数据集大小约为10kB,因此与运行时相比,数据传输时间应忽略不计。 The functions do not need to communicate with each other. 这些功能不需要相互通信。 The return value is a small array. 返回值是一个小数组。

I do not need to synchronize these functions at all. 我完全不需要同步这些功能。 All I care about is that when one core finishes, it gets delegated a new job. 我所关心的是,当一个核心完成时,它被委派了一份新工作。

What is a good framework to start parallelizing this problem with? 什么是开始对此问题进行并行化处理的良好框架? I would like to be able to run this on my own computers and also Amazon units. 我希望能够在我自己的计算机以及Amazon单元上运行它。

Would Python's multiprocessing module do the trick? Python的多处理模块会成功吗? Would I be better off with something other than that? 除此之外,我会更好吗?

if no communication needed - simplest way is Pool.map. 如果不需要通信-最简单的方法是Pool.map。 It like map function, but iterations processed in one of child process. 它类似于map函数,但是迭代在子进程之一中处理。

import multiprocessing
pool = multiprocessing.Pool(processes=4)
def fu(chunk):
    #your code here
    return result

def produce_data(data):
    while data:
        #you need to split data
        yield chunk

result = pool.map(fu,produce_data(data))
# result will be ordered list of results for each chunk

There is few several ways to process data with multiprocessing. 几乎没有几种方法可以通过多处理来处理数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python异步和CPU绑定任务? - Python async and CPU-bound tasks? Windows上针对CPU绑定应用程序的Python WSGI部署 - Python WSGI deployment on Windows for CPU-bound application 受CPU约束的线程是否总是需要Python中的GIL? - Does a cpu-bound thread always reaquire the GIL in Python? 扭曲与队列的CPU绑定任务 - Twisted with queue for CPU-bound tasks 如何加快在 Python 中的并行程序中运行的优化 CPU 绑定进程? - How do I speed up an optimized CPU-bound process that runs within a parallelized program in Python? python asyncio.gather vs asyncio.as_completed 当 IO 任务后跟 CPU 绑定任务 - python asyncio.gather vs asyncio.as_completed when IO task followed by CPU-bound task IO 密集型任务的多线程和 CPU 密集型任务的多处理 - multithreading for IO-bound tasks and multiprocessing for CPU-bound tasks 使 FastAPI WebSockets 的 CPU 绑定任务异步 - Make an CPU-bound task asynchronous for FastAPI WebSockets 使用 multiprocessing 或 ray 与其他 cpu 绑定任务同时写入文件 - Writing files concurrently with other cpu-bound tasks with multiprocessing or ray python是否自动并行化IO和CPU或内存绑定部分? - Is python automagically parallelizing IO- and CPU- or memory-bound sections?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM