简体   繁体   中英

Filter files using regular expressions with sed in unix

im documenting a Shell Script of the server of my job which takes a series of files that starts with the word "dat" and performs a particular task with all those files. The problem is that the script is filtering files using a regular expression with sed command as follows:

namecmp=`grep -l $name dat*.p |sed -e "s/^\(......\)\(..\)\(..\)\(....\)\(.*\)/\1\4\3\2\5/g"| sort -t '.' -k 1.7,1.14 |sed -e "s/^\(......\)\(....\)\(..\)\(..\)\(.*\)/\1\4\3\2\5/g" | tail -1 `

I don't understand how exactly is doing this regular expression to filter out files. It would be helpful to know any expected output or examples files filtered by that expression.

Is there a way to find possible expressions that are accepted by that expression?

grep -l searches in a list of files ( dat*.p ) for a regular expression ( $name in your case, or better: whatever $name evaluates to) and then prints only the files' names in which this was found.

These file names are then passed through the sed command which replaces ( s for substitute) something, namely ^\\(......\\)\\(..\\)\\(..\\)\\(....\\)\\(.*\\) by \\1\\4\\3\\2\\5 (so it just regroups parts of the file names). The transformed file names are then passed to sort , and then to sed again which just seems to undo the regrouping of the file name.

Finally, just the last file name is taken ( tail -1 ) and all the rest is thrown away. This can be achieved a lot cheaper than by sorting all the file names, but who cares ;-)

Effectively, this line finds the name of the "last" file matching the regexp in $name . The meaning of "last" is determined by the sorting of the file names after regrouping; assuming from the size of the groups, I think a time stamp is modified so that it is changed from DDMMYYYY to YYYYMMDD which makes sense in a way.

There are libraries designed to do that ( eg Xeger) but for this I can just provide you with an example:

abcdef02122014foobarfoobarfoobar
^     ^ ^ ^   ^
|     | | |   |
1     2 3 4   5

becomes

abcdef20140212foobarfoobarfoobar
^     ^   ^ ^ ^
|     |   | | |
1     4   3 2 5

and then I don't know what the sort does but the next sed simply puts all of the above back in order.

So it seems the regular expressions are used to temporarily change the format of lines for sorting, before restoring the original format.

echo "1111112233444456789" | sed -e "s/^\(......\)\(..\)\(..\)\(....\)\(.*\)/\1\4\3\2\5/g"

-> 1111114444332256789

explain:

Begin   111111    22    33    4444    56789
^     \(......\)\(..\)\(..\)\(....\)\(.*\)
        \1        \2    \3    \4      \5

optimization:

  • The last \\(.*\\) is not needed and thus the corresponding \\5 must be removed
  • the last g is also not needed (there is only 1 substitution possible dur to ^ meaning start of string )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM