简体   繁体   English

使用 slurm 在单个节点上使用 python 进行多处理

[英]Multiprocessing with python on a single node using slurm

I am trying to run some parallel code on a cluster.我正在尝试在集群上运行一些并行代码。 The cluster uses slurm and my code is in python.集群使用 slurm,我的代码在 python 中。 The code uses multiple cores when I run it on my own machine.当我在自己的机器上运行该代码时,它使用了多个内核。 However, when I try to run the code on the cluster it is extremely slow and does not appear to be using multiple cores.但是,当我尝试在集群上运行代码时,它非常慢,而且似乎没有使用多核。

Here is the relevant code from python:这是来自python的相关代码:

from multiprocessing import Pool

Nz_i=range(1,13)

p=Pool()
p.map(Err_Calc,Nz_i)
p.close()
p.join()

the function Err_Calc is defined earlier on. Err_Calc函数在Err_Calc已经定义。 I don't think its definition is relevant.我不认为它的定义是相关的。

The SBATCH I am using to run the code on the cluster is the following:我用来在集群上运行代码的SBATCH如下:

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH --ntasks-per-node 13
#SBATCH -t 03:10:00

module load python/intel_2.7.14

python Err_vs_Nz_Cl.py 

The file Err_vs_Nz_Cl.py contains the code I showed above.文件Err_vs_Nz_Cl.py包含我上面显示的代码。 I would expect this SBATCH to provide me with 13 cores, but the code seems to be using only 1 core or perhaps is slow for some other reason.我希望这个SBATCH为我提供 13 个内核,但代码似乎只使用了 1 个内核,或者由于其他原因可能很慢。 Does anyone know what's going wrong?有谁知道出了什么问题?

This may be wrong (I'm a newbie to this), but what happens if you change the --ntasks-per-node 13 argument to --cpus-per-task 13 ?这可能是错误的(我是新手),但是如果将 --ntasks-per-node 13 参数更改为 --cpus-per-task 13 会发生什么? I think the docs say that you need to explicitly specify the number of cpus in this way, else it will only run with one cpu.我认为文档说您需要以这种方式明确指定 cpu 的数量,否则它只能使用一个 cpu 运行。

Source: https://slurm.schedmd.com/sbatch.html来源: https : //slurm.schedmd.com/sbatch.html

Because you're running without srun -n (which is correct for multiprocessing with its process-based "threading" model) you need another way to tell it how many CPUs to use.因为您在没有srun -n情况下运行(这对于具有基于进程的“线程”模型的multiprocessing是正确的),您需要另一种方法来告诉它要使用多少 CPU。 This is done in your sbatch script with --ntasks=1 and --cpus-per-task=13 (or however many cores your node has).这是在您的sbatch脚本中使用--ntasks=1--cpus-per-task=13 (或您的节点有多少个内核)完成的。

Be cautious with multiprocessing .谨慎multiprocessing If you're not running on a whole node, or explicitly specifying the number of cores, it will attempt to run on all visible CPU cores, regardless of whether they've been allocated to you or not!如果您没有在整个节点上运行,或者没有明确指定内核数,它将尝试在所有可见的 CPU 内核上运行,无论它们是否已分配给您!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM