简体   繁体   English

按顺序运行作业而不是连续使用bash

[英]Run jobs in sequence rather than consecutively using bash

So I work a lot with Gaussian 09 (the computational chemistry software) on a supercomputer. 所以我在超级计算机上使用Gaussian 09(计算化学软件)做了很多工作。

To submit a job I use the following command line 要提交作业,请使用以下命令行

 g09sub input.com -n 2 -m 4gb -t 200:00:00

Where n is the number of processors used, m is the memory requested, and t is the time requested. 其中n是使用的处理器数量,m是请求的内存,t是请求的时间。

I was wondering if there was a way to write a script that will submit the first 10 .com files in the folder and then submit another .com file as each finishes. 我想知道是否有办法编写一个脚本,将提交文件夹中的前10个.com文件,然后在每个完成时提交另一个.com文件。

I have a script that will submit all the .com files in a folder at once, but I have a limit to how many jobs I can queue on the supercomputer I use. 我有一个脚本可以同时提交文件夹中的所有.com文件,但是我可以在我使用的超级计算机上排队的作业数量有限制。

The current script looks like 当前的脚本看起来像

 #!/bin/bash 
 #SBATCH --partition=shared
 for i in *.com
        do g09sub $i -n 2 -m 4gb -t 200:00:00
 done

So 1.com, 2.com, 3.com, etc would be submitted all at the same time. 因此1.com,2.com,3.com等将同时提交。

What I want is to have 1.com, 2.com, 3.com, 4.com, 5.com, 6.com, 7.com, 8.com, 9.com, and 10.com all start at the same time and then as each of those finishes have another .com file start. 我想要的是1.com,2.com,3.com,4.com,5.com,6.com,7.com,8.com,9.com和10.com都是从同一时间,然后每个完成另一个.com文件启动。 So that no more than 10 jobs from any one folder will be running at the same time. 这样任何一个文件夹中的作业不会同时运行10个以上。

If it would be useful, each job creates a .log file when it is finished. 如果它有用,每个作业在完成时都会创建一个.log文件。

Though I am unsure if it is important, the supercomputer uses a PBS queuing system. 虽然我不确定它是否重要,但超级计算机使用PBS排队系统。

Try xargs or GNU parallel 尝试xargs或GNU parallel

xargs

ls *.com | xargs -I {} g09sub -P 10 {} -n 2 -m 4gb -t 200:00:00

Explanation: 说明:

  • -I {} tell that {} will represent input file name -I {}告诉{}代表输入文件名
  • -P 10 set max jobs at once -P 10一次设置最大作业

parallel

ls *.com | parallel -P 10 g09sub {} -n 2 -m 4gb -t 200:00:00 # GNU parallel supports -P too
ls *.com | parallel --jobs 10 g09sub {} -n 2 -m 4gb -t 200:00:00

Explanation: 说明:

  • {} represent input file name {}表示输入文件名
  • --jobs 10 set max jobs at once --jobs 10一次设置最多工作

Not sure about the availability on your supercomputer, but the GNU bash manual offers a parallel example under 3.2.6 GNU Parallel , at the bottom. 不确定超级计算机的可用性,但GNU bash手册在底部提供了3.2.6 GNU Parallel下的parallel示例。

There are ways to run commands in parallel that are not built into Bash. 有一些方法可以并行运行未构建到Bash中的命令。 GNU Parallel is a tool to do just that. GNU Parallel是一个可以做到这一点的工具。

... ...

Finally, Parallel can be used to run a sequence of shell commands in parallel, similar to 'cat file | 最后,Parallel可用于并行运行一系列shell命令,类似于'cat file | bash'. 庆典”。 It is not uncommon to take a list of filenames, create a series of shell commands to operate on them, and feed that list of commands to a shell. 获取文件名列表,创建一系列shell命令以对其进行操作并将该命令列表提供给shell并不罕见。 Parallel can speed this up. 并行可以加快速度。 Assuming that file contains a list of shell commands, one per line, 假设该文件包含一个shell命令列表,每行一个,

parallel -j 10 < file

will evaluate the commands using the shell (since no explicit command is supplied as an argument), in blocks of ten shell jobs at a time. 将使用shell评估命令(因为没有提供显式命令作为参数),一次只有10个shell作业块。


Where that option was not available to me, using the jobs function worked rather crudely. 在我无法使用该选项的情况下,使用jobs功能的工作相当粗糙。 eg: 例如:

for entry in *.com; do
   while [ $(jobs | wc -l) -gt 9 ]; do
     sleep 1    # this is in seconds; your sleep may support 'arbitrary floating point number'
   done
   g09sub ${entry} -n 2 -m 4gb -t 200:00:00 & 
done

$(jobs | wc -l) counts the number of jobs spawned in the background by ${cmd} & $(jobs | wc -l)计算在后台生成的作业数${cmd} &

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM