I have a shell script job.sh
.
contents are below:
#!/bin/bash
table=$1
sqoop job --exec ${table}
Now when I do ./job.sh table1
The script executes successfully.
I have the table names in a file tables.txt
.
Now I want to loop over the tables.txt
file and execute the job.sh
script 10 times in parallel.
How can I do that?
Ideally when I execute the script I want it to do like below;
./job.sh table1
./job.sh table2
./job.sh table3
./job.sh table4
./job.sh table5
./job.sh table6
./job.sh table7
./job.sh table8
./job.sh table9
./job.sh table10
What are the options available?
Simply with GNU Parallel
parallel -a tables.txt --dry-run sqoop job --exec {}
Sample Output
sqoop job --exec table7
sqoop job --exec table8
sqoop job --exec table9
sqoop job --exec table6
sqoop job --exec table5
sqoop job --exec table4
sqoop job --exec table3
sqoop job --exec table2
sqoop job --exec table1
sqoop job --exec table10
If that looks correct, just remove the --dry-run
and run again for real.
If you would like 4 jobs run at a time, use:
parallel -j 4 ....
If you would like one job per CPU core, that is the default, so you don't need to do anything.
If you would like the jobs to be kept in order, add -k
option:
parallel -k ...
You can just do
< tables.txt xargs -I% -n1 -P10 echo sqoop job --exec %
the -P10
will run 10 processes in parallel. And you don't even need the helper script.
As @CharlesDuffy commented, you don't need the -I
, eg even simpler:
< tables.txt xargs -n1 -P10 echo sqoop job --exec
Option 1
Start all scripts as background processes by appending &
, eg
./job.sh table1 &
./job.sh table2 &
./job.sh table3 &
However, this will run all jobs at the same time!
Option 2
For more time or memory consuming scripts, you can run a limited number of task at the same time using xargs
as for example described here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.