bash script regex matching

Question

In my bash script, I have an array of filenames like

files=( "site_hello.xml" "site_test.xml" "site_live.xml" )

I need to extract the characters between the underscore and the .xml extension so that I can loop through them for use in a function.

If this were python, I might use something like

re.match("site_(.*)\.xml")

and then extract the first matched group.

Unfortunately this project needs to be in bash, so -- How can I do this kind of thing in a bash script? I'm not very good with grep or sed or awk.

Answer 1

Something like the following should work

files2=(${files[@]#site_})   #Strip the leading site_ from each element
files3=(${files2[@]%.xml})    #Strip the trailing .xml

EDIT: After correcting those two typos, it does seem to work :)

Answer 2

xbraer@NO01601 ~
$ VAR=`echo "site_hello.xml" | sed -e 's/.*_\(.*\)\.xml/\1/g'`

xbraer@NO01601 ~
$ echo $VAR
hello

xbraer@NO01601 ~
$

Does this answer your question?

Just run the variables through sed in backticks (``)

I don't remember the array syntax in bash, but I guess you know that well enough yourself, if you're programming bash ;)

If it's unclear, dont hesitate to ask again. :)

Answer 3

I'd use cut to split the string.

for i in site_hello.xml site_test.xml site_live.xml; do echo $i | cut -d'.' -f1 | cut -d'_' -f2; done

This can also be done in awk :

for i in site_hello.xml site_test.xml site_live.xml; do echo $i | awk -F'.' '{print $1}' | awk -F'_' '{print $2}'; done

Answer 4

If you're using arrays, you probably should not be using bash.

A more appropriate example wold be

ls site_*.xml | sed 's/^site_//' | sed 's/\.xml$//'

This produces output consisting of the parts you wanted. Backtick or redirect as needed.

bash script regex matching

Question

4 answers

solution1
5 2011-08-01 19:51:31

solution2
2 ACCPTED 2011-08-01 19:47:51

solution3
0 2011-08-01 19:51:42

solution4
0 2011-08-01 20:08:28

bash script regex matching

Question

4 answers

solution1 5 2011-08-01 19:51:31

solution2 2 ACCPTED 2011-08-01 19:47:51

solution3 0 2011-08-01 19:51:42

solution4 0 2011-08-01 20:08:28

solution1
5 2011-08-01 19:51:31

solution2
2 ACCPTED 2011-08-01 19:47:51

solution3
0 2011-08-01 19:51:42

solution4
0 2011-08-01 20:08:28