简体   繁体   中英

Does Awk support regular expression quantifiers \{m,n\} or \{m\} or \{m,\}?

I am looking to print all the columns in a file that can contain a 10 digit mobile number

I tried this:

awk '/[0-9]\{10\}/{for(i=1;i<=NF;++i)if($i~/[0-9]\{10\}/)print $i}' filename

but this sems to be not working.

I want to do it only using Awk

Eg text in a file

named 9898664511 nameb \n
namea nameb namec 7788992121 \n
namec named 7665544213 named \n
namea namec namef nameg namek 9090876534\n

Yes, it does in GNU awk! Only that you don't have to escape them:

$ awk 'BEGIN{v=10; if (v~/10{2}/) print "yes"}'

$ awk 'BEGIN{v=100; if (v~/10{2}/) print "yes"}'
yes

So your regular expression should be like this instead:

/[0-9]{10}/

Given your sample input, it would yield this:

$ awk '/[0-9]{10}/ {for (i=1;i<=NF;i++) if ($i ~ /[0-9]{10}/) print $i}' n
9898664511
7788992121
7665544213
9090876534\n

So it may be a good idea to use the beginning ^ and end of line $ characters to match those fields consisting in exactly 10 numbers:

$ awk '/[0-9]{10}/ {for (i=1;i<=NF;i++) if ($i ~ /^[0-9]{10}$/) print $i}' n
9898664511
7788992121
7665544213

From The GNU Awk User's Guide → 3.3 Regular Expression Operators :

{n}

{n,}

{n,m}

One or two numbers inside braces denote an interval expression. If there is one number in the braces, the preceding regexp is repeated n times. If there are two numbers separated by a comma, the preceding regexp is repeated n to m times. If there is one number followed by a comma, then the preceding regexp is repeated at least n times:

 wh{3}y 

Matches 'whhhy', but not 'why' or 'whhhhy'.

 wh{3,5}y 

Matches 'whhhy', 'whhhhy', or 'whhhhhy' only.

 wh{2,}y 

Matches 'whhy', 'whhhy', and so on.

Interval expressions were not traditionally available in awk. They were added as part of the POSIX standard to make awk and egrep consistent with each other.

Initially, because old programs may use '{' and '}' in regexp constants, gawk did not match interval expressions in regexps.

However, beginning with version 4.0, gawk does match interval expressions by default. This is because compatibility with POSIX has become more important to most gawk users than compatibility with old programs.

For programs that use '{' and '}' in regexp constants, it is good practice to always escape them with a backslash. Then the regexp constants are valid and work the way you want them to, using any version of awk.16

Finally, when '{' and '}' appear in regexp constants in a way that cannot be interpreted as an interval expression (such as /q{a}/), then they stand for themselves.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM