简体   繁体   中英

awk reg expression pattern matching doesn't work

I tried to use the [:digit:] to match digits in the line, Here is the code.

~ echo -e "abc\n123\ndef" | awk '{/[[:digit:]]/{print $0}}'
awk: syntax error at source line 1
 context is
     >>> {/[[:digit:]]/{ <<<
awk: illegal statement at source line 1
awk: illegal statement at source line 1

My question is:

1, why use [[:digit:]] instead of [:digit:] .

2, why this code snippet won't run? how to modify it?

You must not put your pattern inside braces.

awk '/[[:digit:]]/{print $0}'

Syntax would be,

 awk 'condition{execute if the condition is true}'

In some case, condition alone would be sufficient. For this case below would be enough,

awk '/[[:digit:]]/'

Example:

$ echo -e "abc\n123\ndef" | awk '/[[:digit:]]/'
123

why use [[:digit:]] instead of [:digit:] ?

POSIX bracketed expression [:digit:] alone won't match a digit character, you must need to put that inside a character class like [[:digit:]] .

If you want to match digit aswell as + symbol then you may modify the above POSIX class like

[+[:digit:]]

Awk syntax is:

<condition> { <action> }

where <action> is executed if <condition> is true for the current record. What you wrote is:

{ <condition> { <action> } }

See the difference? You can put a condition inside an action block but then you'd need to surround it with the appropriate control keywords like if or while so awk would know what you want to do with that condition:

{ if (<condition>) { <action> } }
{ while (<condition>) { <action> } }

So, instead of:

{/[[:digit:]]/{print $0}}

to be syntactically and idiomatically correct you should have written:

/[[:digit:]]/{print $0}

but since printing $0 is the default action all you'd really write is:

/[[:digit:]]/

ie:

$ echo -e "abc\n123\ndef" | awk '/[[:digit:]]/'
123

As for why [[:digit:]] instead of [:digit:] :

[:digit:] is a POSIX character class and as such can be used inside a bracket expression as part of a regexp, eg [[:digit:]] , just like a range expression ( 0-9 ) or character list ( 0123456789 ) could alternatively be used inside a bracket expression to the same effect.

This example might help clarify: [:digit:] is a character class and so is [:punct:] and so [[:digit:][:punct:] \\t] is a bracket expression containing 2 character classes and a character list ( \\t ).

From POSIX ( http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm ):

A character class expression is expressed as a character class name enclosed within bracket- ( "[:" and ":]" ) delimiters.

and

A bracket expression (an expression enclosed in square brackets, "[]" ) ... is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ..., character classes, .....

So a character class is [:<name>:] and a bracket expression is [<expression>] where <expression> can be/contain a character class: [[:<name>:]] .

PS WARNING: There is a commonly-referenced website http://www.regular-expressions.info/posixbrackets.html that has the terminology of character classes and bracket expressions completely wrong. Or maybe it'd be more fair to say that the terminology they use is vague at best since they refer to a POSIX bracket expression as a "character class" but then they also refer to a POSIX character class as a "character class". However you want to characterize it, as they state themselves on their site, their terminology is certainly NOT the same terminology that POSIX uses for bracket expressions and character classes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM