简体   繁体   中英

Multiple shell script workers

We'd like to interpret tons of coordinates and do something with them using multiple workers. What we got:

coords.txt

100, 100, 100
244, 433, 233
553, 212, 432
776, 332, 223
...
8887887, 5545554, 2243234

worker.sh

coord_reader='^([0-9]+), ([0-9]+), ([0-9]+)$'
while IFS='' read -r line || [[ -n "$line" ]]; do
    if [[ $line =~ $coord_reader ]]; then

        x=${BASH_REMATCH[1]}
        y=${BASH_REMATCH[2]}
        z=${BASH_REMATCH[3]}
        echo "x is $x, y is $y, z is $z"
fi

done < "$1"

To execute worker.sh we call bash worker.sh coords.txt

Bc we have an amount of millions of coordinates it's needed to split the coords.txt and create multiple workers doing the same task, like coordsaa, coordsab, coordsac each 1 worker.

So we split coords.txt using split .

split -l 1000 coords.txt coords

But, how to assign one file per worker?

I am new to stackoverflow, feel free to comment so I can improve my asking skills.

To run workers from bash to treat a lot of files:

Files architecture:

files/ runner.sh worker.sh 

files/ : it is a folder with a lot a files (for example 1000)
runner.sh : launch a lot a worker
worker.sh file : task to treat a file

For example:

worker.sh :

#!/usr/bin/env bash

sleep 5
echo $1

To run all files in files/ one per worker do:

runner.sh:

#!/usr/bin/env bash

n_processes=$(find files/ -type f | wc -l)
echo "spawning  ${n_processes}"

for file in $(find . -type f); then
    bash worker.sh "${file}" &
done

wait

/!\\ 1000 processes is a lot !!

It is better to create a "pool of processes" (here it guarantees only a number maximum of process running at the same time, an old child process is not reused for a new task but died when its task is done or failed) :

#!/usr/bin/env bash

n_processes=8
echo "max of processes:  ${n_processes}"

for file in $(find files/ -type f); do
    while [[ $(jobs -r | wc -l) -gt ${n_processes} ]]; do
       :
    done
    bash worker.sh "${file}" &
    echo "process pid: $! finished"
done

wait

It is not really a pool of processes but it avoids having a lot of processes at the same time alive, number maximum of processes alive at the same time is given by n_processes .

Execute bash runner.sh .

I would do this with GNU Parallel . Say you want 8 workers running at a time till all the processing is done:

parallel -j 8 --pipepart -a coords.txt --fifo bash worker.sh {}

where:

  • -j8 means "keep 8 jobs running at a time"
  • --pipepart means "split the input file into parts"
  • -a coords.txt means "this is the input file"
  • --fifo means "create a temporary fifo to send the data to, and save its name in {} to pass to your worker script"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM