[英]Python multithreading/multiprocessing & limiting CPU core affinity
In Python, you can create new threads and processes to run a given task with multiprocessing.Pool
, multiprocessing.ThreadPool
, concurrent.futures.ProcessPoolExecutor
, and concurrent.futures.ThreadPoolExecutor
.在 Python 中,您可以使用multiprocessing.Pool
、 multiprocessing.ThreadPool
、 concurrent.futures.ProcessPoolExecutor
和concurrent.futures.ThreadPoolExecutor
创建新的线程和进程来运行给定的任务。
By default, those threads/processes run with the same CPU core affinity as it's parent process, which is all cores/threads available.默认情况下,这些线程/进程以与其父进程相同的 CPU 内核亲缘关系运行,即所有内核/线程都可用。
On Linux/Unix systems, it is possible to change the CPU core affinity using os.sched_setaffinity(pid, mask)
.在 Linux/Unix 系统上,可以使用os.sched_setaffinity(pid, mask)
更改 CPU 内核关联。 The issue is the fact that this is limited to just some Linux/Unix systems.问题是这仅限于某些 Linux/Unix 系统。
There is the psutil
python library that exposes the ability to set CPU core affinity with the psutil.Process().cpu_affinity(CPUS)
where CPUS
is a list of integers identifying which CPU cores/threads should be used by the process, starting at 0.有一个psutil
python 库,它公开了使用psutil.Process().cpu_affinity(CPUS)
设置 CPU 内核关联的能力,其中CPUS
是一个整数列表,用于标识进程应该使用哪些 CPU 内核/线程,从 0 开始.
The issue is that generally the OS CPU scheduler can handle picking and choosing which core/thread should be utilized for a given process, rather than having an end user decide what CPU cores/threads to utilize.问题是操作系统 CPU 调度程序通常可以处理给定进程应该使用的内核/线程的挑选和选择,而不是让最终用户决定使用哪些 CPU 内核/线程。
The question I have is if it's possible to create the thread/process pools and limit each instance to using X number of CPU cores/threads, but not limit their exact core affinity?我的问题是是否可以创建线程/进程池并将每个实例限制为使用 X 个 CPU 内核/线程,但不限制它们的确切内核关联?
For example, if I have PC with 16 cores and want to create 4 processes, I can create a multiprocessing.Pool(processes=4)
object.例如,如果我有 16 个内核的 PC 并且想要创建 4 个进程,我可以创建一个multiprocessing.Pool(processes=4)
对象。 Now if I wanted each of those 4 children to be limited to only using 2 CPU cores each, I would have to use psutil
to preemptively choose 2 CPU cores and assign them to that one process, reoving those 2 CPU cores from the available list of CPU cores, and repeat the process for all 4 processes.现在,如果我希望这 4 psutil
每一个都被限制为每个只能使用 2 个 CPU 内核,我将不得不使用psutil
抢先选择 2 个 CPU 内核并将它们分配给那个进程,从可用列表中回收这 2 个 CPU 内核CPU 内核,并对所有 4 个进程重复该过程。
But this would not be ideal, as what if I gave one process the two weakest cores in the system?但这并不理想,如果我给一个进程提供系统中两个最弱的内核会怎样? Or if those 2 cores were further apart physically (such has the case of modern multi-chiplet AMD Ryzen CPU's or dual CPU socket systems).或者,如果这两个内核在物理上相距更远(例如现代多芯片 AMD Ryzen CPU 或双 CPU 插槽系统的情况)。
I would want to let the OS schedule 2 cores for each process automatically and juggle them as it sees fit, rather than have to manually set and unset the CPU cores for each process.我想让操作系统自动为每个进程安排 2 个内核,并根据需要调整它们,而不是必须为每个进程手动设置和取消设置 CPU 内核。
Is there a way this can be done in Python?有没有办法在 Python 中做到这一点?
Some time ago I had a similar need, so I wrote a CPUResourceManager class to keep track of what cores I had assigned to a process.前段时间我有类似的需求,所以我写了一个 CPUResourceManager 类来跟踪我分配给进程的内核。 Here you would call the get_processors
method to get a list of cores you are going to use.在这里,您将调用get_processors
方法来获取要使用的内核列表。 You set the core affinity use PSUTIL as you are already doing.您可以像已经在做的那样使用 PSUTIL 设置核心关联。 When your process is done, return the cores use the free_processors
method.进程完成后,使用free_processors
方法返回内核。
from typing import NamedTuple
from enum import Enum
class CPUResponse(NamedTuple):
"""
This is the response when the CPUResourceManager is asked for some cores.
"""
success: bool # whether or not there are enough cores available
processors: list # the list of processors to be used by the process
class ProcState(Enum):
"""
Represents the state of a processor core. This is only
represents what we are having the cores do. Not what other unrelated
processes on the machine are doing
"""
idle = 0
busy = 1
class CPUResourceManager:
def __init__(self, cpu_count=max(cpu_count() - 2, 1)) -> None:
self.cpu_count = cpu_count
self.processors = {i: ProcState.idle for i in range(self.cpu_count)}
def cpu_avalaible_count(self):
available = [
p for p, state in self.processors.items() if state == ProcState.idle
]
return len(available)
def get_processors(self, count=1):
"""Get some available cores"""
available = [
p for p, state in self.processors.items() if state == ProcState.idle
]
if len(available) >= count:
cpus = available[:count]
for p in cpus:
self.processors[p] = ProcState.busy
return CPUResponse(True, available[:count])
else:
return CPUResponse(False, [])
def free_processors(self, processors: list):
"""return the cores when you are done"""
for p in processors:
if p in self.processors:
self.processors[p] = ProcState.idle
else:
# manager was likely resized and this processor
# should no longer be considered available
pass
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.