简体   繁体   中英

shell script to zip the files with number of records and with particular naming convention

We would like to zip the files in directory with number of records along with directory naming convention to follow the zip file.

Ex: we have two directories with date names (2021-10-01, 2021-10-02 and each of this parent directories contains sub directories with country names and this country directories contains number of files.

2021-10-01/USA, 2021-10-01/UK
2021-10-02/USA, 2021-10-02/USA 

And we would like to zip the country directories with limited number of records and and zip file should name as parentdirectory_Countrydirectory.zip(2021-10-01_USA.zip) .

And My script accept the dates as parameter and which will pass it to sql query which will extract data with dates parent directory structure with country sub-directories inside the data with files from DB but I am just skipping the sql query part of my script here.

#!/bin/bash
startd=$1
endd=$2

compress () {
 startd=$(date -d $startd +%Y%m%d)
        endd=$(date -d $endd +%Y%m%d)
        while [[ $startd -le $endd ]]
        do
           tempdate=$(date -d $startd +"%Y-%m-%d")
           dirl+=" $tempdate"
           startd=$(date -d"$startd + 1 day" +"%Y%m%d")
        done
        echo $dirl

 for j in $dirl
 do
    if [ -d "$j" ]; then
       cd $j
       for d in *
       do
           zip ${j}_${d}.zip $d
           mv ${j}_${d}.zip ../
       done
     else
       echo "no data extracted on: $j"
     fi
   cd ..
 done
}

I would like to zip the files with limit of number of records and name could be parentdirectory_subdirectory1.zip with incremental of the number with same naming convention.

Note: Number of records means files in the sub directories which is extracted by sql query, USA sub-directory may contains thousand of files so I would wanted to split the zip with sub directory files like 200 files then create the file with same naming convention like 2021-10-01_USA.zip 2021-10-01_USA1.zip etc.

This is a bit tricky to do in Bash, but you can use eg xargs to conveniently split a long list of items into manageable chunks. The challenge then is to pass in a new file name for each zip file. Here's one quick and dirty attempt.

compress () {
    local startd=$(date -d "$1" +%Y%m%d)
    local endd=$(date -d "$2" +%Y%m%d)
    local mm
    local j
    local d
    while [[ $startd -le $endd ]]
    do
        mm=${startd#??}
        j="${startd%????}-${mm%??}-${mm#??}")
        startd=$(date -d"$startd + 1 day" +"%Y%m%d")

        if [ -d "$j" ]; then
            for d in "$j"/*/; do
                printf '%s\0' "$j"/"$d"/* |
                xargs -r -0 -n 200 sh -c '
                    for ((i=0; i<=99; i++)); do
                        test -e "$0${i#0}.zip" || break
                    done
                    zip -j "$0${i#0}.zip" "$@"' ../"${j}_${d}"
             done
         else
             echo "$0: no data extracted on: $j" >&2
         fi
     done
}

Random observations:

  • Please try to use standard indentation; random variations in whitespace confuses readers and probably yourself.
  • The arguments should be passed to the function when you call it, instead of stored in global variables.
  • Random quoting fixes; see also When should I wrap quotes around a shell variable?
  • Use an array for the dates we want to loop overActually, just process the dates one by one and then forget them, instead of separately collecting them in memory first.
  • Rather than call date again to insert dashes in the yyyy-mm-dd format in the array, use a series of parameter expansions. It's a bit tedious code-wise, but avoids calling an external process for something which the shell can do much quicker with internal facilities
  • Create the zip files directly in the parent directory rather than moving them when done
  • We use zip -j to remove directory names from the input files so that we don't have to cd into and back out of each directory. (This is slightly error-prone if you have directory symlinks.)
  • Send error messages to standard error >&2 and include the name of the script which created the message in the message itself.

The real meat is in the slightly complex xargs invocation.

We printf the file names to be zipped as null-separated items so that we can correctly cope with arbitrary file names. (See http://mywiki.wooledge.org/BashFAQ/020 for details.) The -0 argument to xargs is a GNU extension to enable this. The -r argument simply says to do nothing if there is no input (ie there were no files in the directory; probably shopt -s nullglob too ).

The -n 200 says to restrict input to a maximum of 200 files at a time, and we then pass those 200 file names or less to the sh -c script.

... which receives the base name of the zip file we want to create as $0 (this is just a hack to avoid having to separately shift off an argument from the argument list it receives; the first argument to sh -c is otherwise usually unused, so we use that to smuggle in this value). It uses a simple for loop to find the first unused name with this prefix, using an empty string for the very first one.

(Maybe change this - I think your proposed convention is slightly confusing. I would prefer to have xxx.zip solely if there is only a single file in the set, and xxx1.zip , xxx2.zip , etc when there are several.)

Once we have established the file name, we simply zip the files we receive as arguments into that file.

xargs takes care of portioning the input file set into chunks of the desired size and calling the sh -c script as many times as necessary.

This is probably a bit intimidating at first; this would be a fair bit easier in a modern scripting language like Python.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM