[英]How to run a python code with multiple inputs on a same node with slurm id?
I want to run a python program for 10 times and save different output files as output_1, output_2, output_3.....and so on.我想运行一个 python 程序 10 次,并将不同的 output 文件保存为 output_1、output_2、output_3 ......等等。 It can be run using 1 processor and 10 threads.
它可以使用 1 个处理器和 10 个线程运行。 I have access to 96 CPUs on a node, so, I want to perform all these 10 jobs in the same node.
我可以访问一个节点上的 96 个 CPU,因此,我想在同一个节点上执行所有这 10 个作业。
My python code works like我的 python 代码的工作方式类似于
python mycode.py $file_number #file_number =1,2,3,4...
I was submitting jobs like this... but it uses 7 nodes.我正在提交这样的工作......但它使用 7 个节点。
#!/bin/bash
#SBATCH -J v2-array
#SBATCH -o x.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=7
#SBATCH --cpus-per-task=10
#SBATCH -t 72:00:00
#SBATCH --mail-type=FAIL
#SBATCH --array=0-6
python mycode.py $SLURM_ARRAY_TASK_ID
But I want to perform this whole job on a same node instead of 7 nodes, how I can do this?但是我想在同一个节点而不是 7 个节点上执行整个工作,我该怎么做?
Remove the #SBATCH --ntasks-per-node=7
line.删除
#SBATCH --ntasks-per-node=7
行。 What you are requesting is a total of 7 jobs x 7 tasks/job x 10cpus/task = 490CPUs while it seems you only need jobs x 1 tasks/job x 10cpus/task = 70CPUs.您要求的是总共 7 个作业 x 7 个任务/作业 x 10cpus/任务 = 490CPU,而您似乎只需要作业 x 1 个任务/作业 x 10cpus/任务 = 70CPU。
Furthermore, in the above example, unless mycode.py
is explicitly written to interact with Slurm, it will only be able to use 10 CPUs per job (compared with 70 being allocated).此外,在上面的示例中,除非
mycode.py
被明确编写为与 Slurm 交互,否则每个作业只能使用 10 个 CPU(相比之下,分配了 70 个)。
Note that all jobs in an array are independent, and there is no way to have them start on the same node for sure.请注意,数组中的所有作业都是独立的,并且无法确保它们在同一个节点上启动。 They might start at different time, on different nodes, depending on the state of the queue.
它们可能在不同的时间、不同的节点上开始,具体取决于队列的 state。 And if the queue is empty, it will depend on the Slurm configuration that might favour scattering of the jobs over the available nodes (this is a less-used feature, but it exists)
如果队列为空,它将取决于可能有利于将作业分散到可用节点上的 Slurm 配置(这是一个较少使用的功能,但它存在)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.