I have a variable which contains text; I can echo it to stdout so I think the variable is fine. My problem is trying to grep for a pattern in that variable of text. Here is what I am trying:
ERR_COUNT=`echo $VAR_WITH_TEXT | grep "ERROR total: (\d+)"`
When I echo $ERR_COUNT the variable appears to be empty, so I must be doing something wrong.
How to do this properly? Thanks.
EDIT - Just wanted to mention that testing that pattern on the example text I have in the variable does give me something (I tested with: http://rubular.com )
However the regex could still be wrong.
EDIT2 - Not getting any results yet, so here's the string I'm working with:
ALERT line125: Alert: Cannot locate any description for 'asdf' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../hgfd.controls) ALERT line126: Alert: Cannot locate any description for 'zxcv' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../dfhg.controls) ALERT line127: Alert: Cannot locate any description for 'rtyu' in the qwer.xml hierarchy. (due to (?i-xsm:\\balert?\\b) ALERT in ../kjgh.controls) [1] 22280 IGNORE total: 0 WARN total: 0 ALERT total: 3 ERROR total: 23 [1] + Done /tool/pandora/bin/gvim -u NONE -U NONE -nRN -c runtime! plugin/**/*.vim -bg ...
That's the string, so hopefully there should be no ambiguity anymore... I want to extract the number "23" (after "ERROR total: ") into a variable and I'm having a hard time haha.
Cheers
You can use bash's =~
operator to extract the value.
[[ $VAR_WITH_TEXT =~ ERROR\ total:\ ([0-9]+) ]]
Note that you have to escape the spaces, or only only quote the fixed parts of the regular expression:
[[ $VAR_WITH_TEXT =~ "ERROR total: "([0-9]+) ]]
since quoting any of the metacharacters causes them to be treated literally.
You can also save the regex in a variable:
regex="ERROR total: ([0-9]+)"
[[ $VAR_WITH_TEXT =~ $regex ]]
In any case, once the expression matches, the parenthesized expression can be found in BASH_REMATCH
array.
ERR_COUNT=${BASH_REMATCH[1]}
(The zeroth element contains the entire matched regular expression; the parenthesized subexpressions are found in the remaining elements in the order they appear in the full regex.)
If you want to use grep
, you'll need a version that can accept Perl-style regexes.
ERR_COUNT=$( echo "$VAR_WITH_TEXT" | grep -Po "(?<=ERROR total: )\d+" )
As long as you need to use Perl-style regexes to enable the look-behind assertion, you can replace [0-9]
with \\d
.
Your error is in the pattern: (\\d+)
matches:
'('
'+'
')'
According to your comment, what you want is \\(\\d\\+\\)
, which:
\\( ... \\)
\\+
) digit ( \\d
). In this case, if you don't need a sub-pattern, you can just drop the \\(
and \\)
.
Note: if your grep
doesn't understand \\d
, you can replace it by [0-9]
. Easiest way is to write grep '\\d'
and test it by writing a couple test lines.
# setting example data
test="adfa\nfasetrfaqwe\ndsfa ERROR total: 32514235dsfaewrf"
one solution:
echo $(sed -n 's/^.*ERROR total: \([0-9]*\).*$/\1/p' < <(echo $test))
32514235
other solution:
# throw away everything up to "ERROR total: "
test=${test##*ERROR total: }
# cut from behind assuming number contains no spaces and is
# separated by space
test=${test%% *}
echo $test
32514235
The \\d
is probably only recognized as a digit in perl regex mode, you probably want to use grep -P
.
If you only want the number you could try:
ERR_COUNT=$(echo $VAR_WITH_TEXT | perl -pe "s/.*ERROR total: (\\d+).*/\\1/g")
or:
ERR_COUNT=$(echo $VAR_WITH_TEXT | sed -n "s/.*ERROR total: ([0-9]+).*/\\1/gp")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.