Using bash to loop through nested folders to run script in current working directory

Question

I've got (what feels like) a fairly simple problem but my complete lack of experience in bash has left me stumped. I've spent all day trying to synthesize a script from many different SO threads explaining how to do specific things with unintuitive commands, but I can't figure out how to make them work together for the life of me.

Here is my situation: I've got a directory full of nested folders each containing a file with extension.7 and another file with extension.pc, plus a whole bunch of unrelated stuff. It looks like this:

Folder A
   Folder 1
      Folder x
        data_01.7
        helper_01.pc
        ...
      Folder y
        data_02.7
        helper_02.pc
        ...
   ...
   Folder 2
      Folder z
        data_03.7
        helper_03.pc
      ...
   ...
Folder B
...

I've got a script that I need to run in each of these folders that takes in the name of the.7 file as an input.

pc_script -f data.7 -flag1 -other_flags

The current working directory needs to be the folder with the.7 file when running the script and the helper.pc file also needs to be present in it. After the script is finished running, there are a ton of new files and directories. However, I need to take just one of those output files, result.h5, and copy it to a new directory maintaining the same folder structure but with a new name:

Result Folder/Folder A/Folder 1/Folder x/new_result1.h5

I then need to run the same script again with a different flag, flag2, and copy the new version of that output file to the same result directory with a different name, new_result2.h5. The folders all have pretty arbitrary names, though there aren't any spaces or special characters beyond underscores.

Here is an example of what I've tried:

#!/bin/bash

DIR=".../project/data"
for d in */ ; do
    for e in */ ; do
        for f in */ ; do
            for PFILE in *.7 ; do
                echo "$d/$e/$f/$PFILE"
                cd "$DIR/$d/$e/$f"
                echo "Performing operation 1"
                pc_script -f "$PFILE" -flag1
                mkdir -p ".../results/$d/$e/$f"
                mv "results.h5" ".../project/results/$d/$e/$f/new_results1.h5"
                echo "Performing operation 2"
                pc_script -f "$PFILE" -flag 2
                mv "results.h5" ".../project/results/$d/$e/$f/new_results2.h5"
            done
        done
    done
done

Obviously, this didn't work. I've also tried using find with -execdir but then I couldn't figure out how to insert the name of the file into the script flag. I'd appreciate any help or suggestions on how to carry this out.

Answer 1

If there's only one .7 file in each directory then you can try this:

#!/bin/bash
shopt -s globstar nullglob

saveroot=project/results
dataroot=project/data

for filepath in "${dataroot}"/**/*.7
do
    dirpath="${filepath%/*}"
    filename=${filepath#"$dirpath"/}

    pushd "$dirpath" > /dev/null || continue

    echo "$filepath"
    echo "Performing operation 1"
    #pc_script -f "$filename" -flag1
    touch results.h5
    mv results.h5 results_1.h5

    echo "Performing operation 2"
    #pc_script -f "$filename" -flag2
    touch results.h5
    mv results.h5 results_2.h5

    popd > /dev/null

    savepath="$saveroot/${dirpath#"$dataroot"}"
    mkdir -p "${savepath}"
    mv "${dirpath}"/results_*.h5 "$savepath"/
done

The script doesn't check for the existence of the .pc file, but if the naming of your files is like in the question then it's feasible.

Answer 2

Another, perhaps more flexible, approach to the problem is to use the find command with the -exec option to run a short "helper-script" for each file found below a directory path that ends in ".7" . The -name option allows find to locate all files ending in ".7" below a given directory using simple file-globbing (wildcards). The helper-script then performs the same operation on each file found by find and handles moving the result.h5 to the proper directory.

The form of the command will be:

find /path/to/search -type f -name "*.7" -exec /path/to/helper-script '{}` \;

Where the -f option tells find to only return files (not directories) ending in ".7" . Your helper-script needs to be executable (eg chmod +x helper-script ) and unless it is in your PATH , you must provide the full path to the script in the find command. The '{}' will be replaced by the filename (including relative path) and passed as an argument to your helper-script . The \; simply terminates the command executed by -exec .

(note there is another form for -exec called -execdir and another terminator '+' that can be used to process the command on all files in a given directory -- that is a bit safer, but has additional PATH requirements for the command being run. Since you have only one ".7" file per-directory -- there isn't much benefit here)

The helper-script just does what you need to do in each directory. Based on your description it could be something like the following:

#!/bin/bash

dir="${1%/*}"     ## trim file.7 from end of path
cd "$dir" || {    ## change to directory or handle error
  printf "unable to change to directory %s\n" "$dir" >&2
  exit 1
}

destdir="/Result_Folder/$dir"   ## set destination dir for result.h5
mkdir -p "$destdir" || {        ## create with all parent dirs or exit
  printf "unable to create directory %s\n" "$dir" >&2
  exit 1
}

ls *.pc 2>/dev/null || exit 1   ## check .pc file exists or exit

file7="${1##*/}"  ## trim path from file.7 name

pc_script -f "$file7" -flags1 -other_flags    ## first run

## check result.h5 exists and non-empty and copy to destdir
[ -s "result.h5" ] && cp -a "result.h5" "$destdir/new_result1.h5"

pc_script -f "$file7" -flags2 -other_flags    ## second run

## check result.h5 exists and non-empty and copy to destdir
[ -s "result.h5" ] && cp -a "result.h5" "$destdir/new_result2.h5"

Which essentially stores the path part of the file.7 argument in dir and changes to that directory. If unable to change to the directory (due to read-permissions, etc..) the error is handled and the script exits. Next the full directory structure is created below your Result_Folder with mkdir -p with the same error handling if the directory cannot be created.

ls is used as a simple check to verify that a file ending in ".pc" exits in that directory. There are other ways to do this by piping the results to wc -l , but that spawns additional subshells that are best avoided.

(also note that Linux and Mac have files ending in ".pc" for use by pkg-config used when building programs from source -- they should not conflict with your files -- but be aware they exists in case you start chasing why weird ".pc" files are found)

After all tests are performed, the path is trimmed from the current ".7" filename storing just the filename in file7 . The file7 variabli is then used in your pc_script command (which should also include the full path to the script if not in you PATH ). After the pc_script is run [ -s "result.h5" ] is used to verify that result.h5 exists and is non-empty before moving that file to your Result_Folder location.

That should get you started. Using find to locate all .7 files is a simple way to let the tool designed to find the files for you do its job -- rather than trying to hand-roll your own solution. That way you only have to concentrate on what should be done for each file found. (note: I don't have pc_script or the files, so I have not testes this end-to-end, but it should be very close if not right-on-the-money)

There is nothing wrong in writing your own routine, but using find eliminates a lot of area where bugs can hide in your own solution.

Let me know if you have further questions.

Using bash to loop through nested folders to run script in current working directory

Question

1 answers

solution1
0 2022-02-03 01:34:07

solution2
0 2022-02-03 05:27:08

Using bash to loop through nested folders to run script in current working directory

Question

1 answers

solution1 0 2022-02-03 01:34:07

solution2 0 2022-02-03 05:27:08

solution1
0 2022-02-03 01:34:07

solution2
0 2022-02-03 05:27:08