简体   繁体   中英

Run bash shell in parallel and wait

I have 100 files in a directory, and want to process each one with several steps, while step1 is time-consuming. So the pseudocode is like:

for filename in ~/dir/*; do
  run_step1 filename >${filename}.out &
done

for outfile in ~/dir/*.out; do
  run_step2 outfile >${outfile}.result
done

My question is how can I check if step1 is complete for a given input file. I used to use threads.join in C#, but not sure if bash shell has equivalent.

It looks like you want:

for filename in ~/dir/*
do
    (
    run_step1 $filename >${filename}.out
    run_step2 ${filename}.out >${filename}.result
    ) &
done
wait

This processes each file in a separate sub-shell, running first step 1 then step 2 on each file, but processing multiple files in parallel.

About the only issue you'll need to worry about is ensuring you don't try running too many processes in parallel. You might want to consider GNU parallel .

You might want to write a trivial script ( doit.sh , perhaps):

run_step1 "$1" > "$1.out"
run_step2 "$1.out" > "$1.result"

and then invoke that script from parallel , one file per invocation.

Try this:

declare -a PROCNUMS
ITERATOR=0
for filename in ~/dir/*; do
    run_step1 filename >${filename}.out &
    PROCNUMS[$ITERATOR]=$!
    let "ITERATOR=ITERATOR+1"
done

ITERATOR=0
for outfile in ~/dir/*.out; do
    wait ${PROCNUMS[$ITERATOR]}
    run_step2 outfile >${outfile}.result
    let "ITERATOR=ITERATOR+1"
done

This will make an array of the created processes then wait for them in order as they need to be completed, not it relies on the fact there is a 1 to 1 relationship between in and out files and the directory is not changed while it is running.

Not for a small performance boost you can now run the second loop asynchronously too if you like assuming each file is independant.

I hope this helps, but if you have any questions please comment.

The Bash builtin wait can wait for a specific background job or all background jobs to complete. The simple approach would be to just insert a wait in between your two loops. If you'd like to be more specific, you could save the PID for each background job and wait PID directly before run_step2 inside the second loop.

After the loop that executes step1 you could write another loop that executes fg command which moves last process moved to background into foreground.

You should be aware that fg could return an error if a process already finished.

After the loop with fg s you are sure that all steps1 have finished.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM