简体   繁体   English

正则表达式:{n}和{n,m}忽略最大重复次数

[英]Regular Expression: {n} and {n,m} ignore maximum number of repetition

I have a question about regex's maximum number of repetition: {n} and {n, m}. 我对正则表达式的最大重复次数有疑问:{n}和{n,m}。

$ man grep
...
Repetition
    A regular expression may be followed by one of several repetition operators:
...
    {n}    The preceding item is matched exactly n times.
    {n,}   The preceding item is matched n or more times.
    {,m}   The preceding item is matched at most m times.  This is a GNU extension.
    {n,m}  The preceding item is matched at least n times, but not more than m times.
...

Now consider a test file: 现在考虑一个测试文件:

$ cat ./sample.txt
1
12
123
1234

Then grep it for [0-9] (digits) that repeats exactly 2 times: 然后将其重复[0-9](数字),精确重复2次:

$ grep "[0-9]\{2\}" ./sample.txt
12
123
1234

? Why did this include 123 and 1234? 为什么这包括123和1234?

Also, I grep the same text file for digits repeating at least 2 times but not more than 3 times: 另外,我将相同的文本文件grep重复至少2次但不超过3次的数字:

$ grep "[0-9]\{2,3\}" ./sample.txt
12
123
1234

??? ??? Why does this return "1234"? 为什么返回“ 1234”?

An obvious workaround is to use grep and reverse-grep to filter out excessive results. 一个明显的解决方法是使用grep和reverse-grep过滤掉过多的结果。 For example, 例如,

$ grep "[0-9]\{2,\}" ./sample.txt | grep -v "[0-9]\{4,\}"
12
123

Can anyone help me understand why {n} returns the line that contains the pattern repeating over n times? 谁能帮助我理解为什么{n}返回包含重复n次的模式的行? And why {n,m} returns the pattern repeating over m times?? 为什么{n,m}返回重复m次的模式?

Unless you anchor your regular expressions, they can match anywhere in a string. 除非您锚定正则表达式,否则它们可以匹配字符串中的任何位置。

$ grep "[0-9]\\{2\\}" ./sample.txt will match any line that includes 2 digits. $ grep "[0-9]\\{2\\}" ./sample.txt将匹配任何包含2位数字的行。

Use ^ to force your expression to start at the beginning of a line and $ to force it to match to the end of a line. 使用^强制表达式从行的开头开始,使用$强制表达式与行的末尾匹配。 eg. 例如。

$ grep '^[0-9]\{2\}$' ./sample.txt
# Using single quotes to avoid potential substitution issues. Hat tip to @ghoti

This should only return 12 . 这应该只返回12

A pattern may be found within a longer text or may follow the same exact pattern. 模式可以在较长的文本中找到,也可以遵循相同的确切模式。 For grep use -o option to see where the regex found a match. 对于grep,请使用-o选项查看正则表达式在哪里找到匹配项。 Two digits can be found within a number consisted of two digits or in a number with 10-digit long. 在由两位数字组成的数字中或在10位数字长的数字中可以找到两位数字。

The other answer points to two anchors but there is a word boundary token \\b that matches a boundary position if used. 其他答案指向两个锚点,但是有一个单词边界标记\\b与边界位置匹配(如果使用的话)。 This closes both ends. 这两端都关闭。 Unfortunately POSIX BRE (grep default's regex flavor) doesn't support this but in GNU sed you can enable Perl regular expressions and test it: 不幸的是POSIX BRE(grep默认的regex风格)不支持此功能,但是在GNU sed中,您可以启用Perl正则表达式并对其进行测试:

grep -P '\b[0-9]{2}\b' file

with grep alone two \\< and \\> matches the same position: 仅使用grep ,两个\\<\\>匹配相同位置:

grep '\<[0-9]\{2\}\>' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM