在Slurm中使用Python多处理，以及我需要的ntasks或ncpus的组合

Question

I'm trying to run a python script on a slurm cluster, and I'm using python's built-in multiprocessing module. 我正在尝试在slurm集群上运行python脚本，而我正在使用python的内置multiprocessing模块。

I'm using quite a simple set up, where for testing purpose, the example is: 我使用了一个非常简单的设置，为了测试目的，示例是：

len(arg_list)
Out[2]: 5

threads = multiprocessing.Pool(5)
output = threads.map(func, arg_list)

So func is applied 5 times in parallel on 5 arguments in arg_list . 所以func在arg_list 5个参数上并行应用了5次。 What I want to know is how to allocate the correct amount of cpu's/tasks in slurm for this to work as expected. 我想知道的是如何在slurm中分配正确数量的cpu / tasks以使其按预期工作。 This is what the relevant part of my slurm batch script looks like: 这就是我的slurm批处理脚本的相关部分：

#!/bin/bash

# Runtime and memory
#SBATCH --time=90:00:00
#SBATCH --mem-per-cpu=2G

# For parallel jobs
#SBATCH --cpus-per-task=10
##SBATCH --nodes=2
#SBATCH --ntasks=1
##SBATCH --ntasks-per-node=4  

#### Your shell commands below this line ####

srun ./script_wrapper.py 'test'

As you can see, at the moment I have ntasks=1 and cpus-per-task=10 . 如你所见，目前我有ntasks=1和cpus-per-task=10 。 Note that the main bulk of func contains a scipy routine which tends to run on two cores (ie uses 200% cpu usage, which is why I want 10 cpus and not 5). 请注意，func的主要部分包含一个scipy例程，该例程往往在两个内核上运行（即使用200％cpu使用，这就是为什么我想要10 cpus而不是5 cpu）。

Is this the correct way to allocate resources for my purposes, because at the moment the job takes a lot longer than expected (more like it's running in a single thread). 这是为我的目的分配资源的正确方法，因为此时作业需要比预期更长的时间（更像是在单个线程中运行）。

Do I need to set ntasks=5 instead? 我需要设置ntasks=5吗？ Because my impression from online documentation was that ntasks=5 would instead call srun ./script_wrapper.py 'test' five times instead, which is not what I want. 因为我对在线文档的印象是， ntasks=5会改为调用srun ./script_wrapper.py 'test'五次，这不是我想要的。 Am I right in that assumption? 我在这个假设中是对的吗？

Also, is there a way to easily check stuff like CPU usage and all the process id's of the python tasks called by multiprocessing.Pool? 另外，有没有办法轻松检查诸如CPU使用率和multiprocessing.Pool调用的python任务的所有进程id之类的东西？ At the moment I'm trying with sacct -u <user> --format=JobID,JobName,MaxRSS,Elapsed,AveCPU , but the AveCPU and MaxRSS fields always come up empty for some reason (?) and while I see the first script as a process, I don't see the 5 others that should be called by multiprocessing. 目前我正在尝试使用sacct -u <user> --format=JobID,JobName,MaxRSS,Elapsed,AveCPU ，但AveCPU和MaxRSS字段由于某种原因（？）总是空出来，而我看到第一个脚本作为一个过程，我没有看到应该通过多处理调用的其他5个。 Example: 例：

       JobID    JobName     MaxRSS    Elapsed     AveCPU 
------------ ---------- ---------- ---------- ---------- 
16260892             GP              00:13:07            
16260892.0   script_wr+              00:13:07

Answer 1

Your Slurm task allocation looks correct to me. 您的Slurm任务分配对我来说是正确的。 Python's multiprocessing will only run on a single machine, and it looks to me like you're allocating the 10 CPUs on one node correctly. Python的多处理只能在一台机器上运行，在我看来，就像你正确地在一个节点上分配10个CPU一样。 What might be causing the problem is that multiprocessing's Pool.map by default works on "chunks" of the input list rather than one element at a time. 可能导致此问题的原因是多处理的Pool.map默认在输入列表的“块”上工作，而不是一次一个元素。 It does this to minimise overhead when tasks are short. 它可以在任务很短时将开销降至最低。 To force multiprocessing to work on one element of the list at a time, set the chunksize parameter of the map to 1, eg 要强制多处理一次处理列表的一个元素，请将地图的chunksize参数设置为1，例如

threads.map(func, arglist, 1)

See the multiprocessing documentation for more information. 有关更多信息，请参阅多处理文档。

Because you say that you're using a multithreaded version of SciPy, you may also want to check the relevant threading level for the underlying library. 因为您说您使用的是多线程版本的SciPy，所以您可能还需要检查底层库的相关线程级别。 For instance, if your SciPy has been built against Intel Math Kernel Library, try setting the OMP_NUM_THREADS and MKL_NUM_THREADS environment variables to make sure it's using no more than 2 threads per process and making full use (and not over-use) of your allocated SLURM resources. 例如，如果您的SciPy是针对Intel Math Kernel Library构建的，请尝试设置OMP_NUM_THREADS和MKL_NUM_THREADS 环境变量，以确保每个进程使用不超过2个线程并充分利用（而不是过度使用）已分配的SLURM资源。

EDIT: sacct is only going to give you running times for any processes that were launched directly by srun, and not for any subprocesses. 编辑：sacct只会为您提供由srun直接启动的任何进程的运行时间，而不是任何子进程。 Hence in your case you'll only have the one process from the single srun command. 因此，在您的情况下，您将只使用单个srun命令中的一个进程。 To monitor the subprocesses you may have to look into monitoring tools that operate at a system level rather than through Slurm. 要监视子进程，您可能需要查看在系统级而不是通过Slurm运行的监视工具。

在Slurm中使用Python多处理，以及我需要的ntasks或ncpus的组合

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-03-30 06:30:27

在Slurm中使用Python多处理，以及我需要的ntasks或ncpus的组合

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-03-30 06:30:27

解决方案1
1 已采纳 2017-03-30 06:30:27