简体   繁体   中英

ls | grep with variable as regex

I'm writing a bash script to automate a few tasks. One of the things I have to do is search for a pattern among filenames in a directory, then loop through the results.

When I run this script:

data=$(ls $A_PATH_VAR/*.ext | grep -o '201601[0-9]\{2\}\|201602[0-9]\{2\}')
echo $data

I get the expected result - a list of all the matches that were found among the filenames in $A_PATH_VAR/ with the extension .ext . However, when I store said pattern in a variable and then use it, like this:

startmo=201601
endmo=201602

mo=$((startmo+1))
grepstr="'$startmo[0-9]\{2\}"

while [ $mo -le $endmo ]
do
  grepstr="$grepstr\|$mo[0-9]\{2\}"
  mo=$((mo+1))
done

grepstr="$grepstr'"

echo $grepstr # correct

data=$(ls $A_PATH_VAR/*.ext | grep -o $grepstr)
echo $data

The pattern in $grepstr is correctly echoed - that is, it contains the value '201601[0-9]\\{2\\}\\|201602[0-9]\\{2\\}' , but $data is empty. Why is this?


My solution:

mo=$((startmo+1))
grepstr="($startmo[0-9][0-9]"

while [ $mo -le $endmo ]
do
  grepstr="$grepstr|$mo[0-9][0-9]"
  mo=$((mo+1))
done

grepstr="$grepstr)"

files=$(ls $A_PATH_VAR/*.ext)

setopt shwordsplit

for file in $files
do
  if [[ $file =~ $grepstr ]]
  then
    date=$BASH_REMATCH
  fi

  ...
done

In the below, I'm ignoring that your input source is ls , beyond this opening note that ls should not be used in this manner , and find (which, in GNU-extended forms, contains a -regex operator) should be considered instead.


In:

pattern="'pattern'"
grep $pattern

...the double quotes ( " ) are syntactic -- they're consumed by the shell during its parsing phase, whereas the single quotes, inside of them, are literal -- the outer, syntactic quotes specified that everything inside them is to be considered a part of the string (except where the rules for parsing double-quoted content differ).

Thus, when you run grep $pattern , the following happens:

  • The contents of $pattern are broken into words on any characters within IFS. By default, IFS contains only whitespace; however, if you had IFS=a , then this would be broken into a word "pa and a word ttern"
  • Each of these words is expanded as a glob. Thus, if your pattern had contained "hello * world" , and you had a default value of IFS parsing on whitespace, we would have broken into the words "hello , * , and world" -- and the * would then be replaced with a list of files in the current directory.

Obviously, you don't want this. Thus, use only syntactic quotes if your goal is to prevent string-splitting and glob expansion:

pattern="pattern"
grep "$pattern"

BTW, if I had this task, I might write it as follows [to avoid needing to hand-build a regex for each possible date range]:

startmo=201601
endmo=201705
currmo=$startmo

# this requires GNU date
# on MacOS, you can install this via macports and invoke it as gdate
next_month() {
  date -d "+1 month ${1:0:4}-${1:4:2}-15" +%Y%m
}

while [[ $currmo <= $endmo ]]; do
  currmo=$(next_month "$currmo")
  files=( *"$currmo"* )
  [[ -e $files ]] || { echo "No files found for month $currmo" >&2; continue; }
  printf '%s\n' "${files[@]}"
done

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM