简体   繁体   中英

bash script regex matching

In my bash script, I have an array of filenames like

files=( "site_hello.xml" "site_test.xml" "site_live.xml" )

I need to extract the characters between the underscore and the .xml extension so that I can loop through them for use in a function.

If this were python, I might use something like

re.match("site_(.*)\.xml")

and then extract the first matched group.

Unfortunately this project needs to be in bash, so -- How can I do this kind of thing in a bash script? I'm not very good with grep or sed or awk.

Something like the following should work

files2=(${files[@]#site_})   #Strip the leading site_ from each element
files3=(${files2[@]%.xml})    #Strip the trailing .xml

EDIT: After correcting those two typos, it does seem to work :)

xbraer@NO01601 ~
$ VAR=`echo "site_hello.xml" | sed -e 's/.*_\(.*\)\.xml/\1/g'`

xbraer@NO01601 ~
$ echo $VAR
hello

xbraer@NO01601 ~
$

Does this answer your question?

Just run the variables through sed in backticks (``)

I don't remember the array syntax in bash, but I guess you know that well enough yourself, if you're programming bash ;)

If it's unclear, dont hesitate to ask again. :)

I'd use cut to split the string.

for i in site_hello.xml site_test.xml site_live.xml; do echo $i | cut -d'.' -f1 | cut -d'_' -f2; done

This can also be done in awk :

for i in site_hello.xml site_test.xml site_live.xml; do echo $i | awk -F'.' '{print $1}' | awk -F'_' '{print $2}'; done

If you're using arrays, you probably should not be using bash.

A more appropriate example wold be

ls site_*.xml | sed 's/^site_//' | sed 's/\.xml$//'

This produces output consisting of the parts you wanted. Backtick or redirect as needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM