涉及 np.linalg.eig 的单个 Python 脚本莫名其妙地占用了 128 个 CPU？

Question

Note: The problem seems to be related to np.linalg.eig and eigsh and scipy.sparse.linalg.eigsh .注意：问题似乎与np.linalg.eig和eigsh以及scipy.sparse.linalg.eigsh 。 For scripts not involving these functions, everything on the AWS box works as expected.对于不涉及这些功能的脚本，AWS 盒子上的一切都按预期工作。

The most basic script I have found with the problem is:我发现有问题的最基本的脚本是：

import numpy as np
for i in range(0, num_iter):
    x=np.linalg.eig(np.random.rand(1000,1000))

I'm having a very bizarre error on AWS where a basic python script that calculates eigenvalues is using 100% of 64 cores (and is going no faster because of it).我在 AWS 上遇到一个非常奇怪的错误，其中计算特征值的基本 python 脚本使用 100% 的 64 个内核（因此不会更快）。

Objective: Run computationally intensive python code.目标：运行计算密集型 python 代码。 The code is parallel for loop, where each iteration is independent.该代码是并行的 for 循环，其中每次迭代都是独立的。 I have two versions of this code, a basic version without multiprocessing , and one using the multiprocessing module.我有此代码的两个版本，一个没有multiprocessing的基本版本，一个使用multiprocessing模块。

Problem: The virtual machine is a c6i.32xlarge box on AWS with 64 cores/128 threads.问题：虚拟机是 AWS 上的一个 c6i.32xlarge 盒子，具有 64 个核心/128 个线程。

On my personal machine, using 6 cores is roughly ~6 times faster when using the parallelized code.在我的个人机器上，使用 6 个内核大约比使用并行代码快 6 倍。 Using more than 1 core with the same code on the AWS box makes the runtime slower.在 AWS 机器上使用 1 个以上具有相同代码的内核会使运行时间变慢。

Inexplicable Part:莫名其妙的部分：

I tried to get around this by setting up multiple copies the basic script using & , and this doesn't work either.我试图通过使用&设置基本脚本的多个副本来解决这个问题，但这也不起作用。 Running n copies causes them all to be slower by a factor of 1/n.运行 n 个副本会使它们全部变慢 1/n 倍。 Inexplicably, a single instance of the python script uses all the cores of the machine .令人费解的是，python 脚本的单个实例使用了机器的所有核心。 Unix command TOP indicates 6400% of the CPUs being used (ie all of them), and AWS CPU usage monitoring confirms 100% usage of the machine. Unix 命令 TOP 指示 6400% 的 CPU 正在使用（即全部），AWS CPU 使用率监控确认机器使用率为 100%。 I don't see how this is possible given GIL.鉴于 GIL，我不明白这怎么可能。

Partial solution?部分解决方案？ Specifying the processor fixed the issue somewhat:指定处理器在一定程度上解决了这个问题：

Running the commands taskset --cpu-list i my_python_script.py & for i from 1 to n, they do indeed run in parallel, and the time is independent of n (for small n).运行命令taskset --cpu-list i my_python_script.py & for i 从 1 到 n，它们确实是并行运行的，并且时间与 n 无关（对于小 n）。 The expected CPU usage statistics on the AWS monitor are what you would expect . AWS 监视器上预期的 CPU 使用率统计数据是您所期望的。 The speed here when using one processor was the same as when the script ran and was taking all the cores of the machine.使用一个处理器时的速度与脚本运行并占用机器所有内核时的速度相同。

Note: The fact that the runtime on 1 processor is the same suggests it was really running on 1 core all along, and the others are somehow being erroneously used.注意： 1 个处理器上的运行时是相同的这一事实表明它实际上一直在 1 个内核上运行，而其他处理器以某种方式被错误使用。

Question:问题：

Why is my basic python script taking all 64 cores of the AWS machine while not going any faster?为什么我的基本 python 脚本占用了 AWS 机器的所有 64 个内核，但运行速度却没有任何提升？ How is this error even possible?这个错误怎么可能呢？ And how can I get it to run simply with multiprocessing without using this weird taskset --cpu-list work around?我怎样才能让它在不使用这个奇怪taskset --cpu-list work情况下简单地通过多处理运行？

I had the exact same problem on the Google Cloud Platform as well.我在谷歌云平台上也遇到了完全相同的问题。

The basic script is very simple:基本脚本非常简单：

from my_module import my_np_and_scipy_function
from my_other_module import input_function

if __name__ == "__main__":
    output = []
    for i in range(0, num_iter):
        result = my_np_and_scipy_function(kwds, param = input_function)
        output.extend(result)

With multiprocessing , it is:使用multiprocessing ，它是：

from my_module import my_np_and_scipy_function

if __name__ == "__main__":

    pool = multiprocessing.Pool(cpu_count)
    for i in range(0, num_iter):
        result = pool.apply_async(my_np_and_scipy_function,kwds={"param":input_function,...},
        )
        results.append(result)

    output = []
    for x in results:
        output.extend(x.get())

Answer 1

Numpy use multiprocessing in some random functions. Numpy 在一些随机函数中使用多处理。 So it is possible.所以这是可能的。 You can see here https://github.com/numpy/numpy/search?q=multiprocessing你可以在这里看到https://github.com/numpy/numpy/search?q=multiprocessing

Answer 2

Following the answers in the post, Limit number of threads in numpy , the numpy eig functions and the scripts work properly by putting the following lines of code at the top of the script:按照帖子中的答案， Limit number of threads in numpy，numpy eig 函数和脚本通过将以下代码行放在脚本顶部来正常工作：

import os

os.environ["MKL_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"

涉及 np.linalg.eig 的单个 Python 脚本莫名其妙地占用了 128 个 CPU？

问题描述

Question:问题：

2 个解决方案

解决方案1
1 2023-01-28 11:33:42

解决方案2
1 2023-01-29 02:47:36

涉及 np.linalg.eig 的单个 Python 脚本莫名其妙地占用了 128 个 CPU？

问题描述

Question:问题：

2 个解决方案

解决方案1 1 2023-01-28 11:33:42

解决方案2 1 2023-01-29 02:47:36

解决方案1
1 2023-01-28 11:33:42

解决方案2
1 2023-01-29 02:47:36