简体   繁体   English

Awk是否支持正则表达式量词\\ {m,n \\}或\\ {m \\}或\\ {m,\\}?

[英]Does Awk support regular expression quantifiers \{m,n\} or \{m\} or \{m,\}?

I am looking to print all the columns in a file that can contain a 10 digit mobile number 我想将文件中的所有列打印为可以包含10位手机号码的文件

I tried this: 我尝试了这个:

awk '/[0-9]\{10\}/{for(i=1;i<=NF;++i)if($i~/[0-9]\{10\}/)print $i}' filename

but this sems to be not working. 但这是行不通的。

I want to do it only using Awk 我只想用Awk做

Eg text in a file 例如文件中的文字

named 9898664511 nameb \n
namea nameb namec 7788992121 \n
namec named 7665544213 named \n
namea namec namef nameg namek 9090876534\n

Yes, it does in GNU awk! 是的,在GNU awk中确实如此! Only that you don't have to escape them: 只是您不必逃避它们:

$ awk 'BEGIN{v=10; if (v~/10{2}/) print "yes"}'

$ awk 'BEGIN{v=100; if (v~/10{2}/) print "yes"}'
yes

So your regular expression should be like this instead: 因此,您的正则表达式应改为:

/[0-9]{10}/

Given your sample input, it would yield this: 给定您的样本输入,它将产生以下结果:

$ awk '/[0-9]{10}/ {for (i=1;i<=NF;i++) if ($i ~ /[0-9]{10}/) print $i}' n
9898664511
7788992121
7665544213
9090876534\n

So it may be a good idea to use the beginning ^ and end of line $ characters to match those fields consisting in exactly 10 numbers: 因此,它可能是一个好主意,使用之初^线和最终$字符来匹配这些领域包括恰好 10个号码:

$ awk '/[0-9]{10}/ {for (i=1;i<=NF;i++) if ($i ~ /^[0-9]{10}$/) print $i}' n
9898664511
7788992121
7665544213

From The GNU Awk User's Guide → 3.3 Regular Expression Operators : 《 GNU Awk用户指南》→3.3正则表达式运算符

{n} {N}

{n,} {N,}

{n,m} {N,M}

One or two numbers inside braces denote an interval expression. 花括号内的一个或两个数字表示间隔表达式。 If there is one number in the braces, the preceding regexp is repeated n times. 如果花括号中有一个数字,则前面的正则表达式将重复n次。 If there are two numbers separated by a comma, the preceding regexp is repeated n to m times. 如果有两个数字用逗号分隔,则前面的正则表达式将重复n到m次。 If there is one number followed by a comma, then the preceding regexp is repeated at least n times: 如果有一个数字后跟一个逗号,那么前面的正则表达式将重复至少n次:

 wh{3}y 

Matches 'whhhy', but not 'why' or 'whhhhy'. 匹配“为什么”,但不匹配“为什么”或“为什么”。

 wh{3,5}y 

Matches 'whhhy', 'whhhhy', or 'whhhhhy' only. 仅匹配“ whhhy”,“ whhhhy”或“ whhhhhy”。

 wh{2,}y 

Matches 'whhy', 'whhhy', and so on. 匹配“为什么”,“为什么”,依此类推。

Interval expressions were not traditionally available in awk. 传统上,间隔表达式在awk中不可用。 They were added as part of the POSIX standard to make awk and egrep consistent with each other. 它们被添加为POSIX标准的一部分,以使awk和egrep彼此一致。

Initially, because old programs may use '{' and '}' in regexp constants, gawk did not match interval expressions in regexps. 最初,由于旧程序可能在正则表达式常量中使用“ {”和“}”,因此gawk与正则表达式中的间隔表达式不匹配。

However, beginning with version 4.0, gawk does match interval expressions by default. 但是,从版本4.0开始,gawk会默认匹配间隔表达式。 This is because compatibility with POSIX has become more important to most gawk users than compatibility with old programs. 这是因为对于大多数gawk用户而言,与POSIX的兼容性比与旧程序的兼容性更为重要。

For programs that use '{' and '}' in regexp constants, it is good practice to always escape them with a backslash. 对于在正则表达式常量中使用'{'和'}'的程序,最好始终使用反斜杠对其进行转义。 Then the regexp constants are valid and work the way you want them to, using any version of awk.16 然后,使用任何版本的awk.16,regexp常量均有效并按照您希望的方式工作。

Finally, when '{' and '}' appear in regexp constants in a way that cannot be interpreted as an interval expression (such as /q{a}/), then they stand for themselves. 最后,当“ {”和“}”以无法解释为间隔表达式的方式(例如/ q {a} /)出现在正则表达式常量中时,则它们代表自己。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM