等待子进程完成，然后再开始新的子进程

Question

I have to process ten very big files. 我必须处理十个非常大的文件。 Each file takes about two days to process by my_profiler . 每个文件大约需要两天的时间由my_profiler 。 I can parallelize the work so that my_profiler runs on each file separately, hence using all of my system's cores. 我可以并行化工作，以便my_profiler分别在每个文件上运行，因此使用了我系统的所有核心。 My approach at parallelizing the work is to run three processes in three different terminals at the time. 我使工作并行化的方法是同时在三个不同的终端中运行三个流程。 I can't process more than four files at once, or my system starts getting unresponsive (hangs up). 我不能一次处理四个以上的文件，否则我的系统开始变得无响应（挂起）。

My goal is to write a shell script which processes the ten files in batches of size three. 我的目标是编写一个Shell脚本，以批处理大小为3的十个文件。 Once processing of one file finishes, the terminal should be closed and processing of a new file should start in another terminal. 一旦完成一个文件的处理，应关闭终端，并在另一终端中开始处理新文件。 As a terminal I want to use gnome-terminal . 作为终端，我想使用gnome-terminal 。

Currently I am stuck with the following script, which runs all processes in parallel: 目前，我受制于以下脚本，该脚本并行运行所有进程：

for j in $jobs
do
    gnome-terminal -- bash -c "my_profiler $j"
done

How I can wait until a shell script running in a instance of gnome-terminal finishes? 我如何才能等到在gnome-terminal实例中运行的Shell脚本完成？

My first thought was that I might need to send a signal form the old terminals once their job is finished. 我的第一个想法是，一旦完成工作，我可能需要从旧终端发送信号。

Answer 1

I am not quite sure why you have to start a new gnome-terminal for each job. 我不太确定为什么您必须为每个作业启动一个新的gnome-terminal 。 But you could use xargs in combination with -P ^[1] . 但是您可以将xargs与-P ^[1]结合使用。 Running three my_profiler in parallel at the same time: 同时并行运行三个my_profiler ：

echo "${jobs}" | xargs -P3 -I{} gnome-terminal --wait -e 'bash -c "my_profiler {}"'

Important here is to start gnome-terminal with --wait otherwise the terminal demonizes itself which will have the effect that xargs starts the next process. 这里重要的是用--wait启动gnome-terminal ，否则终端会自我妖魔化，这将导致xargs启动下一个进程。 --wait was introduced with gnome-terminal 3.27.1 . --wait是在gnome-terminal 3.27.1中引入的。

The -I{} option to xargs defines a placeholder ( {} ) which xargs will replace with a filename before running the command ^[2] . xargs的-I{}选项定义一个占位符（ {} ）， xargs将在运行命令^[2]之前用文件名替换。 In the example above, xargs scans the command string ( gnome-terminal --wait -e 'bash -c "my_profiler {}"' ) for {} and replaces the found instances with the first file coming from stdin ( echo "${jobs}" | ... ). 在上面的示例中， xargs扫描命令字符串（ gnome-terminal --wait -e 'bash -c "my_profiler {}"' ）中的{} ，并将找到的实例替换为来自stdin的第一个文件（ echo "${jobs}" | ... ）。 The resulting string it then executes. 然后执行结果字符串。 xargs will do this three times ( -P3 ), before it starts waiting for at least one process to finish. xargs将执行三遍（ -P3 ），然后开始等待至少一个进程完成。 If this happens, xargs will start the next process. 如果发生这种情况， xargs将开始下一个过程。

[1]: from man xargs [1]：来自man xargs

-P max-procs , --max-procs=max-procs -P max-procs ， --max-procs=max-procs

Run up to max-procs processes at a time; 一次运行max-procs进程； the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. 默认值为1。如果max-procs为0，则xargs将运行尽可能多的进程。 Use the -n option or the -L option with -P ; 将-n选项或-L选项与-P ； otherwise chances are that only one exec will be done. 否则，只有一名高管会被执行。 While xargs is running, you can send its process a SIGUSR1 signal to increase the number of commands to run simultaneously, or a SIGUSR2 to decrease the number. 当xargs运行时，您可以向其进程发送SIGUSR1信号以增加要同时运行的命令的数量，或者向SIGUSR2发送以减少其数量。 You cannot increase it above an implementation-defined limit (which is shown with --show-limits ). 您不能将其增加到实现定义的限制（显示为--show-limits ）以上。 You cannot decrease it below 1. xargs never terminates its commands; 您不能将其降低到1以下xargs永远不会终止其命令； when asked to decrease, it merely waits for more than one existing command to terminate before starting another. 当要求减少时，它仅等待一个以上的现有命令终止，然后再启动另一个命令。

Please note that it is up to the called processes to properly manage parallel access to shared resources. 请注意 ，由调用的进程来适当地管理对共享资源的并行访问。 For example, if more than one of them tries to print to stdout, the ouptut will be produced in an indeterminate order (and very likely mixed up) unless the processes collaborate in some way to prevent this. 例如，如果其中一个以上试图打印到标准输出，则输出将以不确定的顺序（很可能混合在一起）生产，除非流程以某种方式进行协作以防止这种情况。 Using some kind of locking scheme is one way to prevent such problems. 使用某种锁定方案是防止此类问题的一种方法。 In general, using a locking scheme will help ensure correct output but reduce performance. 通常，使用锁定方案将有助于确保正确的输出，但会降低性能。 If you don't want to tolerate the performance difference, simply arrange for each process to produce a separate output file (or otherwise use separate resources). 如果您不希望容忍性能差异，则只需安排每个进程以生成单独的输出文件（或使用单独的资源）。

[2]: from man xargs [2]：来自man xargs

-I replace-str

Replace occurrences of replace-str in the initial-arguments with names read from standard input. 将初始参数中出现的replace-str从标准输入中读取的名称。 Also, unquoted blanks do not terminate input items; 同样，未加引号的空格也不会终止输入项目。 instead the separator is the newline character. 相反，分隔符是换行符。 Implies -x and -L 1 . 表示-x和-L 1 。

Answer 2

If I understand this right... 如果我理解这一权利...

I think you could use wait $job in order for the job to complete. 我认为您可以使用wait $job job来完成工作。

Here's an example. 这是一个例子。 Following script will start max. 以下脚本将最大启动。 3 jobs in parallel, in background. 3个工作并行，在后台。 Once one of these 3 jobs will end, it will start another one. 一旦这3个工作之一结束，它将开始另一个工作。

#!/bin/bash

THREADS='3';
FILES=$(find source_dir_path -type f -name "your files*")

for file in ${FILES}
do
 NUMPROC=$(ps -ef |grep -i [y]our_process_name| wc -l |tr -d ' ')
 while (( $NUMPROC >= 3))
 do
  sleep 60
  NUMPROC=$(ps -ef |grep -i [y]our_process_name| wc -l |tr -d ' ')
 done
 echo "Starting: " $file;
 #your file processing command below, I assume this would be:
 my_profiler $file &
done

for job in `jobs -p`
do
 wait $job
done

Answer 3

Each file takes about 2 days to process 每个文件大约需要2天的处理时间

Running them in a graphical window is the most expensive operation there is. 在图形窗口中运行它们是最昂贵的操作。 Flushing the terminal window can be expensive, if your process outputs lot of stdout (like cp -vr /bigfolder /anotherfolder ) you will see performance difference. 刷新终端窗口可能会很昂贵，如果您的进程输出大量的stdout（例如cp -vr /bigfolder /anotherfolder ），则会看到性能差异。 Also, running an X application with background job makes it dependent on X server - if your X server crashes, you loose your work. 另外，运行带有后台作业的X应用程序使其依赖于X服务器-如果X服务器崩溃，则您将失去工作。 That all is unrelated work to what you are trying to do. 这与您要尝试的工作无关。

For single run workloads (run&forget), I would go with xargs -Pjobs . 对于单次运行工作负载（run＆forget），我将使用xargs -Pjobs 。 I would add some ionice nice to make the system usable white the process is running. 我会添加一些ionice nice以使系统在进程运行时可用。 The processes stdout output could be discarded, interleaved with some prefix added ex. 进程stdout输出可以被丢弃，与某些前缀ex交织。 with | sed 's/^/'"${job}: "'/' 与| sed 's/^/'"${job}: "'/' | sed 's/^/'"${job}: "'/' , saved to a file. | sed 's/^/'"${job}: "'/' ，保存到文件中。 Or better, | logger 或者更好， | logger | logger redirected to system logger. | logger重定向到系统记录器。

If it would be a one-time job, I would open a tmux or screen session, type: 如果是一次性工作，我将打开tmux或screen会话，键入：

printf "%s\n" $jobs | ionice nice xargs -t -P$(nproc) sh -c 'my_profiler "$1"' --

and discard the tmux or screen session for later use. 并丢弃tmux或screen会话以供以后使用。 Set an alarm on my phone in 3 days and check it in 3 days later. 在3天内在我的手机上设置一个闹钟，然后在3天内检查一下。

The ionice nice will make your system somehow usable while the processes are processing. ionice nice将使您的系统在处理过程中以某种方式可用。 The -P$(nproc) will limit the processes to the number of cores. -P$(nproc)将进程限制为内核数。 If the my_profiler is highly I/O dependent and you don't care about the system performance while running the jobs, It's sometimes advisable to run more jobs then cores, as they will block on I/O anyway. 如果my_profiler高度依赖I / O，并且您在运行作业时不关心系统性能，则建议运行比核心更多的作业有时是可取的，因为它们仍然会阻塞I / O。

You could add | logger -p local0.info --id=$$ 您可以添加| logger -p local0.info --id=$$ | logger -p local0.info --id=$$ to the end after xargs or inside child sh shell inside xargs , so that it will redirect the output to system log using local0.info priority and id of the PID of the current shell. | logger -p local0.info --id=$$到结束后xargs或儿童的内部sh内壳xargs ，使得其将重定向使用输出到系统日志local0.info当前壳的PID的优先级和id。

A way better in my opinion option is to create a systemd service file. 在我看来，更好的方法是创建一个systemd服务文件。 Create such my_profiles@.service file: 创建这样的my_profiles@.service文件：

[Unit]
Description=Run my_profiler for %i
[Service]
# full path to my_profiler
ExecStart=/usr/bin/my_profiler %i
CPUSchedulingPolicy=batch
Nice=19
IOSchedulingClass=best-effort

Add the service to search path with systemd link my_profiler@.service or create that as a dropin service file inside /var/run/systemd/system . 使用systemd link my_profiler@.service将服务添加到搜索路径，或将其创建为/var/run/systemd/system systemd link my_profiler@.service服务文件。 Then start running it with printf "%s\\n" $jobs | xargs -I{} -t systemctl start ./my_profiler@{}.service 然后使用printf "%s\\n" $jobs | xargs -I{} -t systemctl start ./my_profiler@{}.service开始运行它printf "%s\\n" $jobs | xargs -I{} -t systemctl start ./my_profiler@{}.service printf "%s\\n" $jobs | xargs -I{} -t systemctl start ./my_profiler@{}.service . printf "%s\\n" $jobs | xargs -I{} -t systemctl start ./my_profiler@{}.service 。

That way I could get all the logs I need from journalctl -u my_profiler@job.service and logs will never fill 100% of my disc space, because journalctl checks that. 这样，我可以从journalctl -u my_profiler@job.service获取所需的所有日志，日志将永远不会填满我的磁盘空间的100％，因为journalctl进行检查。 Errors will be easily reported and inspected with systemd list-failed or systemd status my_profiler@job.service . 通过systemd list-failed或systemd status my_profiler@job.service可以轻松地报告和检查错误。

Answer 4

Yet another approach, because counting the processes based on a substring in the process table might be problematic. 还有另一种方法，因为基于进程表中的子字符串对进程进行计数可能会有问题。 Especially if you start subprocesses in your script the count can be unreliable. 特别是如果您在脚本中启动子流程，则计数可能会不可靠。 You also wrote, that the processes run for 2 days, so you might have the problem at times, that you need to restart from the prefious point. 您还写道，该进程运行了2天，因此有时可能会出现问题，需要从优先级重新启动。

You could do that in just a bit more complicated way. 您可以用稍微复杂一些的方式完成该操作。 You need one script that starts your processes and watches them if they still look healty (the process did not crash --> otherwise it restarts them). 您需要一个脚本来启动您的进程，并在它们看起来仍然正常时对其进行监视（该进程没有崩溃->否则它将重新启动它们）。 This requires an init script, the script which fills the process queue and a small modification of your profiler script. 这需要一个初始化脚本，一个填充进程队列的脚本以及对Profiler脚本的少量修改。

Script 1: initialize the process 脚本1：初始化过程

Create a job directory with one file per job in order to track the progress automatically. 创建一个作业目录，每个作业一个文件，以便自动跟踪进度。 If all jobs can be processed without problems, it will be deleted automatically later. 如果可以毫无问题地处理所有作业，则稍后会自动将其删除。

#!/bin/bash
tmpdir=/tmp/
jobdir=${tmpdir}/jobs
num_jobs=3
mkdir -p ${jobdir}

i=1
for file in $jobs ; do
    ((i++))
    echo "${file}" > ${jobdir}/${i}.open
done

Script 2: starting the actual processes 脚本2：开始实际流程

#!/bin/bash
jobdir=${tmpdir}/jobs
num_jobs=3

function fill_process_queue() {
    # arg1: num_jobs
    # arg2: jobdir
    # arg3...: open jobs
    num_jobs=$1
    jobdir=$2
    shift 2
    while [[ $(ls ${jobdir}/*.running.* | wc -l) -lt ${num_jobs} -a $# -gt 0 ]] ; do
        job_file=$1
        shift 1
        gnome-terminal -- bash -c "my_profiler $(cat ${jobdir}/${job_file}) ${jobdir}/${job_file}"
        # now give the called job some time to
        # mark it's territory (rename the job file)
        sleep 5s
    done
}

while [[ -z $(ls ${jobdir}) ]] ; do
    # still files present, so first check if
    # all started processes are still running
    for started_job in $(ls ${jobdir}/*.running.* 2>/dev/null) ; do
        # check if the running processes are still alive
        pid= "{started_job//[0-9]\.running\.}"
        jobid= "{started_job//\.running\.[0-9]*}"
        if ! kill -0 ${pid} 2> /dev/null ; then
            # process is not running anymore
            # don't worry kill -0 doesn't harm your
            # process
            mv ${jobdir}/${started_job} ${jobdir}/${jobid}
        fi
    done
    fill_process_queue ${num_jobs} ${jobdir} ${jobdir}/*.open
    sleep 30s
done
# if the directory is empty, it will be removed automatically by rmdir, if non-empty, it remains
rmdir ${jobdir}

Changes of the profiler script 分析器脚本的更改

The profiler script needs to rename the job file, so it includes the pid of the profiler script at the start of the script and needs to delete the file, once it is sucessfully finished. 探查器脚本需要重命名作业文件，因此它在脚本开始时包含探查器脚本的pid，一旦成功完成文件，则需要删除该文件。 The file name is passed as extra argument after the job argument (so it should be argument 2). 文件名在job参数之后作为附加参数传递（因此它应该是参数2）。 These changes look like: 这些更改如下所示：

# at the very beginning of your script
process_file=${2//\.open/}.running.$$
mv $2 ${process_file}

# at the very end of your script, if everything went fine
rm ${process_file}

等待子进程完成，然后再开始新的子进程

问题描述

4 个解决方案

解决方案1
3 已采纳 2019-08-31 20:12:59

解决方案2
1 2019-08-31 17:21:03

解决方案3
1 2019-09-01 16:00:08

解决方案4
0 2019-09-01 15:31:35

Script 1: initialize the process 脚本1：初始化过程

Script 2: starting the actual processes 脚本2：开始实际流程

Changes of the profiler script 分析器脚本的更改

等待子进程完成，然后再开始新的子进程

问题描述

4 个解决方案

解决方案1 3 已采纳 2019-08-31 20:12:59

解决方案2 1 2019-08-31 17:21:03

解决方案3 1 2019-09-01 16:00:08

解决方案4 0 2019-09-01 15:31:35

Script 1: initialize the process 脚本1：初始化过程

Script 2: starting the actual processes 脚本2：开始实际流程

Changes of the profiler script 分析器脚本的更改

解决方案1
3 已采纳 2019-08-31 20:12:59

解决方案2
1 2019-08-31 17:21:03

解决方案3
1 2019-09-01 16:00:08

解决方案4
0 2019-09-01 15:31:35