简体   繁体   中英

Using OpenMP in shell scripting

I am working on Mac OSX, and using bash in terminal.

I have been able to find a lot of literature on using OpenMP in C, however I have come across a program for which I am running a shell script. The script calls an executable 1000 times with 1000 different parameters. None of the calls depend on each other so I know that this could be a good spot to parallelize my code.

The problem then is how do you use #pragma omp parallel for, in shell scripting for bash. I've also thought of writing the what the shell does in a .c file, but I wasn't sure how to call .exe and rename and move files in c.

Here is the shell script with understandable names:

#!/bin/zsh
for ((x = 0, y = 4 ; x < 1000 ; x++, y *= 1.0162))
do
typeset -F 3 y
echo $y
./program arg1 $y /path0 arg2
mv file1.ppm file1.$(printf %04d $x).ppm
mv file1.$(printf %04d $x).ppm /path1
mv file2.ppm file2.$(printf %04d $x).ppm
mv file2.$(printf %04d $x).ppm /path2
done

paste a.txt b.txt > c.txt
mv c.txt /path3

Explanation of variables:

program runs after taking in 4 parameters. For the purpose of this script, only y is varying.

arg1 and arg2 are given.

All /path's are path to varies places where I store data.

file1.ppm and file2.ppm are computed from the ./program call.

a.txt and b.txt are given and computed in the for loop, respectively.

So OpenMP is something that's built into compilers, and isn't something you can access from bash - but you don't need too.

Let's consider a single run; for a given run, you could have a script (call it dorun) which runs one complete job:

#!/bin/zsh
x=$1
y=$( echo $x | awk '{print 4.*(1.0162^$1)}' )
typeset -F 3 y
echo $y
./program arg1 $y /path0 arg2
mv file1.ppm file1.$(printf %04d $x).ppm
mv file1.$(printf %04d $x).ppm /path1
mv file2.ppm file2.$(printf %04d $x).ppm
mv file2.$(printf %04d $x).ppm /path2

And if you call this with, say dorun 5 you'll get the x=5 job from the above.

Now you have to figure out how to run this for 0...999 in parallel. My favourite tool for doing this sort of thing is gnu parallel , which lets you fire off many of these jobs, even if they take different lengths of time, and keep a fixed number of processors busy. At our centre, we have instructions on its use here , but there are many other places with good examples of its use.

In this case, you could do something as simple as:

seq 1000 | parallel -j 4 --workdir $PWD ./dorun {}
paste a.txt b.txt > c.txt
mv c.txt /path3

to run this script for parameters x=0...999 on up to 8 processors on the local machine; there are even options for making use of other hosts.

In addition to Jonathan's answer, I recommend you to use task spooler . It allows you to schedule your tasks, configure how many of your tasks should be run in parallel at once, control how many of them finished, where the output should go etc.

I used to use Portable Batch System (PBS) for submitting parallel jobs. You can set this up with clusters of computers as well, submit your jobs, run them in the background at low priority, and then collect the output at the host.

http://en.wikipedia.org/wiki/Portable_Batch_System

However, the free version of this seems not to be supported anymore so I don't know what is recommened now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM