Bash script grep for pattern in variable of text

Question

I have a variable which contains text; I can echo it to stdout so I think the variable is fine. My problem is trying to grep for a pattern in that variable of text. Here is what I am trying:

ERR_COUNT=`echo $VAR_WITH_TEXT | grep "ERROR total: (\d+)"`

When I echo $ERR_COUNT the variable appears to be empty, so I must be doing something wrong.

How to do this properly? Thanks.

EDIT - Just wanted to mention that testing that pattern on the example text I have in the variable does give me something (I tested with: http://rubular.com )

However the regex could still be wrong.

EDIT2 - Not getting any results yet, so here's the string I'm working with:

ALERT line125: Alert: Cannot locate any description for 'asdf' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../hgfd.controls) ALERT line126: Alert: Cannot locate any description for 'zxcv' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../dfhg.controls) ALERT line127: Alert: Cannot locate any description for 'rtyu' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../kjgh.controls) [1] 22280 IGNORE total: 0 WARN total: 0 ALERT total: 3 ERROR total: 23 [1] + Done /tool/pandora/bin/gvim -u NONE -U NONE -nRN -c runtime! plugin/**/*.vim -bg ...

That's the string, so hopefully there should be no ambiguity anymore... I want to extract the number "23" (after "ERROR total: ") into a variable and I'm having a hard time haha.

Cheers

Answer 1

You can use bash's =~ operator to extract the value.

[[ $VAR_WITH_TEXT =~ ERROR\ total:\ ([0-9]+) ]]

Note that you have to escape the spaces, or only only quote the fixed parts of the regular expression:

[[ $VAR_WITH_TEXT =~ "ERROR total: "([0-9]+) ]]

since quoting any of the metacharacters causes them to be treated literally.

You can also save the regex in a variable:

regex="ERROR total: ([0-9]+)"
[[ $VAR_WITH_TEXT =~ $regex ]]

In any case, once the expression matches, the parenthesized expression can be found in BASH_REMATCH array.

ERR_COUNT=${BASH_REMATCH[1]}

(The zeroth element contains the entire matched regular expression; the parenthesized subexpressions are found in the remaining elements in the order they appear in the full regex.)

If you want to use grep , you'll need a version that can accept Perl-style regexes.

ERR_COUNT=$( echo "$VAR_WITH_TEXT" | grep -Po "(?<=ERROR total: )\d+" )

As long as you need to use Perl-style regexes to enable the look-behind assertion, you can replace [0-9] with \\d .

Answer 2

Your error is in the pattern: (\\d+) matches:

'('
a digit
'+'
')'

According to your comment, what you want is \$\\d\\+\$ , which:

defines a sub-pattern by \$ ... \$
- Inside it matches at least one ( \\+ ) digit ( \\d ).

In this case, if you don't need a sub-pattern, you can just drop the \$ and \$ .

Note: if your grep doesn't understand \\d , you can replace it by [0-9] . Easiest way is to write grep '\\d' and test it by writing a couple test lines.

Answer 3

# setting example data
    test="adfa\nfasetrfaqwe\ndsfa ERROR total: 32514235dsfaewrf"

one solution:

echo $(sed -n 's/^.*ERROR total: \([0-9]*\).*$/\1/p' < <(echo $test))
32514235

other solution:

# throw away everything up to "ERROR total: "
test=${test##*ERROR total: } 
# cut from behind assuming number contains no spaces and is
# separated by space
test=${test%% *}
echo $test
32514235

Answer 4

The \\d is probably only recognized as a digit in perl regex mode, you probably want to use grep -P .

If you only want the number you could try:

ERR_COUNT=$(echo $VAR_WITH_TEXT | perl -pe "s/.*ERROR total: (\\d+).*/\\1/g")

or:

ERR_COUNT=$(echo $VAR_WITH_TEXT | sed -n "s/.*ERROR total: ([0-9]+).*/\\1/gp")

Bash script grep for pattern in variable of text

Question

4 answers

solution1
6 ACCPTED 2012-08-07 02:49:11

solution2
3 2012-08-06 22:30:57

solution3
1 2012-08-06 22:27:13

solution4
1 2012-08-06 22:30:02

Bash script grep for pattern in variable of text

Question

4 answers

solution1 6 ACCPTED 2012-08-07 02:49:11

solution2 3 2012-08-06 22:30:57

solution3 1 2012-08-06 22:27:13

solution4 1 2012-08-06 22:30:02

solution1
6 ACCPTED 2012-08-07 02:49:11

solution2
3 2012-08-06 22:30:57

solution3
1 2012-08-06 22:27:13

solution4
1 2012-08-06 22:30:02