[英]Multiprocessing with python on a single node using slurm
I am trying to run some parallel code on a cluster.我正在尝试在集群上运行一些并行代码。 The cluster uses slurm and my code is in python.
集群使用 slurm,我的代码在 python 中。 The code uses multiple cores when I run it on my own machine.
当我在自己的机器上运行该代码时,它使用了多个内核。 However, when I try to run the code on the cluster it is extremely slow and does not appear to be using multiple cores.
但是,当我尝试在集群上运行代码时,它非常慢,而且似乎没有使用多核。
Here is the relevant code from python:这是来自python的相关代码:
from multiprocessing import Pool
Nz_i=range(1,13)
p=Pool()
p.map(Err_Calc,Nz_i)
p.close()
p.join()
the function Err_Calc
is defined earlier on. Err_Calc
函数在Err_Calc
已经定义。 I don't think its definition is relevant.我不认为它的定义是相关的。
The SBATCH
I am using to run the code on the cluster is the following:我用来在集群上运行代码的
SBATCH
如下:
#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH --ntasks-per-node 13
#SBATCH -t 03:10:00
module load python/intel_2.7.14
python Err_vs_Nz_Cl.py
The file Err_vs_Nz_Cl.py
contains the code I showed above.文件
Err_vs_Nz_Cl.py
包含我上面显示的代码。 I would expect this SBATCH
to provide me with 13 cores, but the code seems to be using only 1 core or perhaps is slow for some other reason.我希望这个
SBATCH
为我提供 13 个内核,但代码似乎只使用了 1 个内核,或者由于其他原因可能很慢。 Does anyone know what's going wrong?有谁知道出了什么问题?
This may be wrong (I'm a newbie to this), but what happens if you change the --ntasks-per-node 13 argument to --cpus-per-task 13 ?这可能是错误的(我是新手),但是如果将 --ntasks-per-node 13 参数更改为 --cpus-per-task 13 会发生什么? I think the docs say that you need to explicitly specify the number of cpus in this way, else it will only run with one cpu.
我认为文档说您需要以这种方式明确指定 cpu 的数量,否则它只能使用一个 cpu 运行。
Source: https://slurm.schedmd.com/sbatch.html来源: https : //slurm.schedmd.com/sbatch.html
Because you're running without srun -n
(which is correct for multiprocessing
with its process-based "threading" model) you need another way to tell it how many CPUs to use.因为您在没有
srun -n
情况下运行(这对于具有基于进程的“线程”模型的multiprocessing
是正确的),您需要另一种方法来告诉它要使用多少 CPU。 This is done in your sbatch
script with --ntasks=1
and --cpus-per-task=13
(or however many cores your node has).这是在您的
sbatch
脚本中使用--ntasks=1
和--cpus-per-task=13
(或您的节点有多少个内核)完成的。
Be cautious with multiprocessing
.谨慎
multiprocessing
。 If you're not running on a whole node, or explicitly specifying the number of cores, it will attempt to run on all visible CPU cores, regardless of whether they've been allocated to you or not!如果您没有在整个节点上运行,或者没有明确指定内核数,它将尝试在所有可见的 CPU 内核上运行,无论它们是否已分配给您!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.