简体   繁体   中英

Bash script grep for pattern in variable of text

I have a variable which contains text; I can echo it to stdout so I think the variable is fine. My problem is trying to grep for a pattern in that variable of text. Here is what I am trying:

ERR_COUNT=`echo $VAR_WITH_TEXT | grep "ERROR total: (\d+)"`

When I echo $ERR_COUNT the variable appears to be empty, so I must be doing something wrong.

How to do this properly? Thanks.

EDIT - Just wanted to mention that testing that pattern on the example text I have in the variable does give me something (I tested with: http://rubular.com )

However the regex could still be wrong.

EDIT2 - Not getting any results yet, so here's the string I'm working with:

ALERT line125: Alert: Cannot locate any description for 'asdf' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../hgfd.controls) ALERT line126: Alert: Cannot locate any description for 'zxcv' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../dfhg.controls) ALERT line127: Alert: Cannot locate any description for 'rtyu' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../kjgh.controls) [1] 22280 IGNORE total: 0 WARN total: 0 ALERT total: 3 ERROR total: 23 [1] + Done /tool/pandora/bin/gvim -u NONE -U NONE -nRN -c runtime! plugin/**/*.vim -bg ...

That's the string, so hopefully there should be no ambiguity anymore... I want to extract the number "23" (after "ERROR total: ") into a variable and I'm having a hard time haha.

Cheers

You can use bash's =~ operator to extract the value.

[[ $VAR_WITH_TEXT =~ ERROR\ total:\ ([0-9]+) ]]

Note that you have to escape the spaces, or only only quote the fixed parts of the regular expression:

[[ $VAR_WITH_TEXT =~ "ERROR total: "([0-9]+) ]]

since quoting any of the metacharacters causes them to be treated literally.

You can also save the regex in a variable:

regex="ERROR total: ([0-9]+)"
[[ $VAR_WITH_TEXT =~ $regex ]]

In any case, once the expression matches, the parenthesized expression can be found in BASH_REMATCH array.

ERR_COUNT=${BASH_REMATCH[1]}

(The zeroth element contains the entire matched regular expression; the parenthesized subexpressions are found in the remaining elements in the order they appear in the full regex.)


If you want to use grep , you'll need a version that can accept Perl-style regexes.

ERR_COUNT=$( echo "$VAR_WITH_TEXT" | grep -Po "(?<=ERROR total: )\d+" )

As long as you need to use Perl-style regexes to enable the look-behind assertion, you can replace [0-9] with \\d .

Your error is in the pattern: (\\d+) matches:

  • '('
  • a digit
  • '+'
  • ')'

According to your comment, what you want is \\(\\d\\+\\) , which:

  • defines a sub-pattern by \\( ... \\)
    • Inside it matches at least one ( \\+ ) digit ( \\d ).

In this case, if you don't need a sub-pattern, you can just drop the \\( and \\) .

Note: if your grep doesn't understand \\d , you can replace it by [0-9] . Easiest way is to write grep '\\d' and test it by writing a couple test lines.

# setting example data
    test="adfa\nfasetrfaqwe\ndsfa ERROR total: 32514235dsfaewrf"

one solution:

echo $(sed -n 's/^.*ERROR total: \([0-9]*\).*$/\1/p' < <(echo $test))
32514235

other solution:

# throw away everything up to "ERROR total: "
test=${test##*ERROR total: } 
# cut from behind assuming number contains no spaces and is
# separated by space
test=${test%% *}
echo $test
32514235

The \\d is probably only recognized as a digit in perl regex mode, you probably want to use grep -P .

If you only want the number you could try:

ERR_COUNT=$(echo $VAR_WITH_TEXT | perl -pe "s/.*ERROR total: (\\d+).*/\\1/g")

or:

ERR_COUNT=$(echo $VAR_WITH_TEXT | sed -n "s/.*ERROR total: ([0-9]+).*/\\1/gp")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM