Python 多线程/多处理和限制 CPU 内核关联性

Question

In Python, you can create new threads and processes to run a given task with multiprocessing.Pool , multiprocessing.ThreadPool , concurrent.futures.ProcessPoolExecutor , and concurrent.futures.ThreadPoolExecutor .在 Python 中，您可以使用multiprocessing.Pool 、 multiprocessing.ThreadPool 、 concurrent.futures.ProcessPoolExecutor和concurrent.futures.ThreadPoolExecutor创建新的线程和进程来运行给定的任务。

By default, those threads/processes run with the same CPU core affinity as it's parent process, which is all cores/threads available.默认情况下，这些线程/进程以与其父进程相同的 CPU 内核亲缘关系运行，即所有内核/线程都可用。

On Linux/Unix systems, it is possible to change the CPU core affinity using os.sched_setaffinity(pid, mask) .在 Linux/Unix 系统上，可以使用os.sched_setaffinity(pid, mask)更改 CPU 内核关联。 The issue is the fact that this is limited to just some Linux/Unix systems.问题是这仅限于某些 Linux/Unix 系统。

There is the psutil python library that exposes the ability to set CPU core affinity with the psutil.Process().cpu_affinity(CPUS) where CPUS is a list of integers identifying which CPU cores/threads should be used by the process, starting at 0.有一个psutil python 库，它公开了使用psutil.Process().cpu_affinity(CPUS)设置 CPU 内核关联的能力，其中CPUS是一个整数列表，用于标识进程应该使用哪些 CPU 内核/线程，从 0 开始.

The issue is that generally the OS CPU scheduler can handle picking and choosing which core/thread should be utilized for a given process, rather than having an end user decide what CPU cores/threads to utilize.问题是操作系统 CPU 调度程序通常可以处理给定进程应该使用的内核/线程的挑选和选择，而不是让最终用户决定使用哪些 CPU 内核/线程。

The question I have is if it's possible to create the thread/process pools and limit each instance to using X number of CPU cores/threads, but not limit their exact core affinity?我的问题是是否可以创建线程/进程池并将每个实例限制为使用 X 个 CPU 内核/线程，但不限制它们的确切内核关联？

For example, if I have PC with 16 cores and want to create 4 processes, I can create a multiprocessing.Pool(processes=4) object.例如，如果我有 16 个内核的 PC 并且想要创建 4 个进程，我可以创建一个multiprocessing.Pool(processes=4)对象。 Now if I wanted each of those 4 children to be limited to only using 2 CPU cores each, I would have to use psutil to preemptively choose 2 CPU cores and assign them to that one process, reoving those 2 CPU cores from the available list of CPU cores, and repeat the process for all 4 processes.现在，如果我希望这 4 psutil每一个都被限制为每个只能使用 2 个 CPU 内核，我将不得不使用psutil抢先选择 2 个 CPU 内核并将它们分配给那个进程，从可用列表中回收这 2 个 CPU 内核CPU 内核，并对所有 4 个进程重复该过程。

But this would not be ideal, as what if I gave one process the two weakest cores in the system?但这并不理想，如果我给一个进程提供系统中两个最弱的内核会怎样？ Or if those 2 cores were further apart physically (such has the case of modern multi-chiplet AMD Ryzen CPU's or dual CPU socket systems).或者，如果这两个内核在物理上相距更远（例如现代多芯片 AMD Ryzen CPU 或双 CPU 插槽系统的情况）。

I would want to let the OS schedule 2 cores for each process automatically and juggle them as it sees fit, rather than have to manually set and unset the CPU cores for each process.我想让操作系统自动为每个进程安排 2 个内核，并根据需要调整它们，而不是必须为每个进程手动设置和取消设置 CPU 内核。

Is there a way this can be done in Python?有没有办法在 Python 中做到这一点？

Answer 1

Some time ago I had a similar need, so I wrote a CPUResourceManager class to keep track of what cores I had assigned to a process.前段时间我有类似的需求，所以我写了一个 CPUResourceManager 类来跟踪我分配给进程的内核。 Here you would call the get_processors method to get a list of cores you are going to use.在这里，您将调用get_processors方法来获取要使用的内核列表。 You set the core affinity use PSUTIL as you are already doing.您可以像已经在做的那样使用 PSUTIL 设置核心关联。 When your process is done, return the cores use the free_processors method.进程完成后，使用free_processors方法返回内核。

from typing import NamedTuple
from enum import Enum


class CPUResponse(NamedTuple):
    """
    This is the response when the CPUResourceManager is asked for some cores.
    """
    success: bool     # whether or not there are enough cores available
    processors: list  # the list of processors to be used by the process


class ProcState(Enum):
    """
    Represents the state of a processor core.  This is only 
    represents what we are having the cores do.  Not what other unrelated
    processes on the machine are doing
    """
    idle = 0
    busy = 1


class CPUResourceManager:
    def __init__(self, cpu_count=max(cpu_count() - 2, 1)) -> None:

        self.cpu_count = cpu_count

        self.processors = {i: ProcState.idle for i in range(self.cpu_count)}

    def cpu_avalaible_count(self):
        available = [
            p for p, state in self.processors.items() if state == ProcState.idle
        ]
        return len(available)

    def get_processors(self, count=1):
        """Get some available cores"""
        available = [
            p for p, state in self.processors.items() if state == ProcState.idle
        ]

        if len(available) >= count:
            cpus = available[:count]
            for p in cpus:
                self.processors[p] = ProcState.busy
            return CPUResponse(True, available[:count])
        else:
            return CPUResponse(False, [])

    def free_processors(self, processors: list):
        """return the cores when you are done"""
        for p in processors:
            if p in self.processors:
                self.processors[p] = ProcState.idle
            else:
                # manager was likely resized and this processor
                # should no longer be considered available
                pass

Python 多线程/多处理和限制 CPU 内核关联性

问题描述

1 个解决方案

解决方案1
0 2022-01-04 21:22:08

Python 多线程/多处理和限制 CPU 内核关联性

问题描述

1 个解决方案

解决方案1 0 2022-01-04 21:22:08

解决方案1
0 2022-01-04 21:22:08