并行处理文件组，然后使用Slurm进行串行计算

Question

I need to convert every file in a particular directory then compile the results into a single computation on a system using slurm. 我需要转换特定目录中的每个文件，然后在使用Slurm的系统上将结果编译为单个计算。 The work on each individual file takes about as long as the rest of the collective calculations. 每个单独文件上的工作大约需要其余集体计算时间。 Therefore, I would like the individual conversions to happen simultaneously. 因此，我希望各个转换同时发生。 Sequentially, this is what I need to do: 因此，这是我需要做的：

main.sh main.sh

#!/bin/bash
#SBATCH --account=millironx
#SBATCH --time=1-00:00:00
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=4

find . -maxdepth 1 -name "*.input.txt" \
  -exec ./convert-files.sh {} \;

./compile-results.sh *.output.txt

./compute.sh

echo "All Done!"

convert-files.sh 转换文件

#!/bin/bash
# Simulate a time-intensive process
INPUT=${1%}
OUTPUT="${$INPUT/input.txt/output.txt}"
sleep 10
date > $OUTPUT

While this system works, I generally process batches of 30+ files, and the computational time exceeds the time limit set by the administrator while only using one node. 在该系统正常运行的同时，我通常处理30多个文件的批处理，并且计算时间超出了管理员仅使用一个节点时设置的时间限制。 How can I process the files in parallel then compile and compute on them after they all have been completely processed? 如何并行处理文件，然后在文件全部处理完毕后对其进行编译和计算？

What I've tried/considered 我尝试过/考虑过的

Adding srun to `find -exec` 添加srun `find -exec`

find . -maxdepth 1 -name "*.input.txt" \
  -exec srun -n1 -N1 --exclusive ./convert-files.sh {} \;

find -exec waits for blocking processes , and srun is blocking , so this does exactly the same thing as the base code time-wise. find -exec等待阻塞的进程，而srun在阻塞，因此这与时间上的基本代码完全相同。

Using sbatch in the submission script 在提交脚本中使用sbatch

find . -maxdepth 1 -name "*.input.txt" \
  -exec sbatch ./convert-files.sh {} \;

This does not wait for the conversions to finish before starting the computations, and they consequently fail. 这不会在开始计算之前等待转换完成，因此会失败。

Using GNU parallel 使用GNU并行

find . -maxdepth 1 -name "*.input.txt" | \
  parallel ./convert-files.sh

OR 要么

find . -maxdepth 1 -name "*.input.txt" | \
  parallel srun -n1 -N1 --exclusive ./convert-files.sh

parallel can only "see" the number of CPUs on the current node, so it only processes four files at a time. 并行只能“查看”当前节点上的CPU数量，因此它一次只能处理四个文件。 Better, but still not what I'm looking for. 更好，但仍然不是我想要的。

Using job arrays 使用作业数组

This method sounds promising , but I can't figure out a way to make it work since the files I'm processing don't have a sequential number in their names. 这种方法听起来很有希望，但由于要处理的文件名称中没有序号，因此我无法找到一种使之起作用的方法。

Submitting jobs separately using sbatch 使用sbatch分别提交作业

At the terminal: 在航站楼：

$ find . -maxdepth 1 -name "*.input.txt" \
>  -exec sbatch --account=millironx --time=05:00:00 --cpus-per-task=4 \
>  ./convert-files.sh {} \;

Five hours later: 五小时后：

$ srun --account=millironx --time=30:00 --cpus-per-task=4 \
>   ./compile-results.sh *.output.txt & \
>   sbatch --account=millironx --time=05:00:00 --cpus-per-task=4 \
>   ./compute.sh

This is the best strategy I've come up with so far, but it means I have to remember to check on the progress of the conversion batches and initiate the computation once they are complete. 到目前为止，这是我提出的最佳策略，但这意味着我必须记住检查转换批处理的进度，并在完成转换后立即开始计算。

Using sbatch with a dependency 将sbatch与依赖项一起使用

At the terminal: 在航站楼：

$ find . -maxdepth 1 -name "*.input.txt" \
>  -exec sbatch --account=millironx --time=05:00:00 --cpus-per-task=4 \
>  ./convert-files.sh {} \;
Submitted job xxxx01
Submitted job xxxx02
...
Submitted job xxxx45
$ sbatch --account=millironx --time=30:00 --cpus-per-task=4 \
>   --dependency=after:xxxx45 --job-name=compile_results \
>   ./compile-results.sh *.output.txt & \
>   sbatch --account=millironx --time=05:00:00 --cpus-per-task=4 \
>   --dependency=after:compile_results \
>   ./compute.sh

I haven't dared to try this yet, since I know that the last job is not guaranteed to be the last to finish. 我还不敢尝试，因为我知道最后的工作并不能保证最后完成。

This seems like it should be such an easy thing to do, but I haven't figured it out, yet. 似乎应该很容易做到，但是我还没有弄清楚。

Answer 1

如果您的$SLURM_NODELIST包含类似于node1,node2,node34 ，则可能可行：

find ... | parallel -S $SLURM_NODELIST convert_files

Answer 2

The find . -maxdepth 1 -name "*.input.txt" | parallel srun -n1 -N1 --exclusive ./convert-files.sh find . -maxdepth 1 -name "*.input.txt" | parallel srun -n1 -N1 --exclusive ./convert-files.sh find . -maxdepth 1 -name "*.input.txt" | parallel srun -n1 -N1 --exclusive ./convert-files.sh find . -maxdepth 1 -name "*.input.txt" | parallel srun -n1 -N1 --exclusive ./convert-files.sh way probably the one to follow. find . -maxdepth 1 -name "*.input.txt" | parallel srun -n1 -N1 --exclusive ./convert-files.sh可能是遵循的方式。 But it seems ./convert-files.sh expect the filename as argument, and you are trying to push it to stdin through the pipe. 但是似乎./convert-files.sh希望将文件名作为参数，并且您正尝试通过管道将其推入stdin 。 You need to use xargs , and as xargs can work in parallel, you do not need the parallel command. 您需要使用xargs ，并且由于xargs可以并行工作，因此不需要parallel命令。

Try: 尝试：

find . -maxdepth 1 -name "*.input.txt" | xargs -L1 -P$SLURM_NTASKS srun -n1 -N1 --exclusive ./convert-files.sh

-L1 will split the result of find per line, and feed it to convert.sh , spawning maximum $SLURM_NTASKS processes at a time, and 'sending' each of them to a CPU on the nodes allocated by Slurm thanks to srun -n1 -N1 --exclusive . -L1将按行分割find结果，并将其馈送到convert.sh ，一次生成最大$SLURM_NTASKS进程，并通过srun -n1 -N1 --exclusive将每个进程“发送”到Slurm分配的节点上的CPU中srun -n1 -N1 --exclusive 。

并行处理文件组，然后使用Slurm进行串行计算

问题描述

What I've tried/considered 我尝试过/考虑过的

Adding srun to `find -exec` 添加srun `find -exec`

Using sbatch in the submission script 在提交脚本中使用sbatch

Using GNU parallel 使用GNU并行

Using job arrays 使用作业数组

Submitting jobs separately using sbatch 使用sbatch分别提交作业

Using sbatch with a dependency 将sbatch与依赖项一起使用

2 个解决方案

解决方案1
1 2019-05-14 05:11:34

解决方案2
1 已采纳 2019-05-15 14:28:50

并行处理文件组，然后使用Slurm进行串行计算

问题描述

What I've tried/considered 我尝试过/考虑过的

Adding srun to find -exec 添加srun find -exec

Using sbatch in the submission script 在提交脚本中使用sbatch

Using GNU parallel 使用GNU并行

Using job arrays 使用作业数组

Submitting jobs separately using sbatch 使用sbatch分别提交作业

Using sbatch with a dependency 将sbatch与依赖项一起使用

2 个解决方案

解决方案1 1 2019-05-14 05:11:34

解决方案2 1 已采纳 2019-05-15 14:28:50

Adding srun to `find -exec` 添加srun `find -exec`

解决方案1
1 2019-05-14 05:11:34

解决方案2
1 已采纳 2019-05-15 14:28:50