简体   繁体   English

Python slurm 集群上的作业,节点与核心

[英]Python job on slurm cluster, nodes vs cores

I have an extremely basic question, to which I somehow never managed to find an answer.我有一个非常基本的问题,不知何故我一直找不到答案。 Let's assume that I have access to a cluster running slurm, and that I need to run a Python job on the cluster.假设我可以访问运行 slurm 的集群,并且我需要在该集群上运行 Python 作业。 Let us assume that my code has not been written to support multiprocessing.让我们假设我的代码没有被编写来支持多处理。 Do I have any reason to require multiple cores?我有任何理由需要多核吗? Or should I then stick to 1 node and 1 core?或者我应该坚持使用 1 个节点和 1 个核心?

Conversely, if I wanted to run 5 times the same script (with different input variables for example), is there any difference between requiring 1 node and assigning 1 core to each job, or requiring 5 nodes with 1 core each?相反,如果我想运行同一个脚本 5 次(例如,使用不同的输入变量),需要 1 个节点并为每个作业分配 1 个核心,或者需要 5 个节点,每个节点有 1 个核心,这两者之间有什么区别吗?

If your python script does not use multiple threads, then yes, you should stick to one task ( -n1 ) with one CPU ( -c1 ) on one node ( -N1 ).如果您的 python 脚本不使用多线程,那么是的,您应该在一个节点 ( -N1 ) 上使用一个 CPU ( -c1 ) 坚持一项任务 ( -n1 )。 You don't need to specify that as it is the default anyway.你不需要指定它,因为它是默认的。 If you request more resources, they will just be wasted, as you don't use them.如果您请求更多资源,它们只会被浪费,因为您不使用它们。

However: Some python libraries to multithreaded calculations without the need to specify them explicitly, so if you do some numpy calculations, you may benefit from multiple cores .然而:一些 python 库不需要明确指定它们就可以进行多线程计算,所以如果你做一些 numpy 计算,你可能会受益于多核

Conversely, if I wanted to run 5 times the same script (with different input variables for example), is there any difference between requiring 1 node and assigning 1 core to each job, or requiring 5 nodes with 1 core each?相反,如果我想运行同一个脚本 5 次(例如,使用不同的输入变量),需要 1 个节点并为每个作业分配 1 个核心,或者需要 5 个节点,每个节点有 1 个核心,这两者之间有什么区别吗?

Yes, there is: If you ask Slurm for 5 Nodes with 1 task per node, then it will have to wait until 5 nodes have room for a task.是的,有:如果您向 Slurm 请求 5 个节点,每个节点有 1 个任务,那么它必须等到 5 个节点有空间来执行任务。 Even if for example a node with 20 CPUs is completely empty, your job won't run, as you explicitly asked for 5 nodes.例如,即使具有 20 个 CPU 的节点完全为空,您的作业也不会运行,因为您明确要求 5 个节点。 So I would advise to start 5 jobs with -n1 .所以我建议用-n1开始 5 个工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM