正则表达式：{n}和{n，m}忽略最大重复次数

Question

我对正则表达式的最大重复次数有疑问：{n}和{n，m}。

$ man grep
...
Repetition
    A regular expression may be followed by one of several repetition operators:
...
    {n}    The preceding item is matched exactly n times.
    {n,}   The preceding item is matched n or more times.
    {,m}   The preceding item is matched at most m times.  This is a GNU extension.
    {n,m}  The preceding item is matched at least n times, but not more than m times.
...

现在考虑一个测试文件：

$ cat ./sample.txt
1
12
123
1234

然后将其重复[0-9]（数字），精确重复2次：

$ grep "[0-9]\{2\}" ./sample.txt
12
123
1234

？ 为什么这包括123和1234？

另外，我将相同的文本文件grep重复至少2次但不超过3次的数字：

$ grep "[0-9]\{2,3\}" ./sample.txt
12
123
1234

??? 为什么返回“ 1234”？

一个明显的解决方法是使用grep和reverse-grep过滤掉过多的结果。 例如，

$ grep "[0-9]\{2,\}" ./sample.txt | grep -v "[0-9]\{4,\}"
12
123

谁能帮助我理解为什么{n}返回包含重复n次的模式的行？ 为什么{n，m}返回重复m次的模式？

Answer 1

除非您锚定正则表达式，否则它们可以匹配字符串中的任何位置。

$ grep "[0-9]\\{2\\}" ./sample.txt将匹配任何包含2位数字的行。

使用^强制表达式从行的开头开始，使用$强制表达式与行的末尾匹配。 例如。

$ grep '^[0-9]\{2\}$' ./sample.txt
# Using single quotes to avoid potential substitution issues. Hat tip to @ghoti

这应该只返回12 。

Answer 2

模式可以在较长的文本中找到，也可以遵循相同的确切模式。 对于grep，请使用-o选项查看正则表达式在哪里找到匹配项。 在由两位数字组成的数字中或在10位数字长的数字中可以找到两位数字。

其他答案指向两个锚点，但是有一个单词边界标记\\b与边界位置匹配（如果使用的话）。 这两端都关闭。 不幸的是POSIX BRE（grep默认的regex风格）不支持此功能，但是在GNU sed中，您可以启用Perl正则表达式并对其进行测试：

grep -P '\b[0-9]{2}\b' file

仅使用grep ，两个\\<和\\>匹配相同位置：

grep '\<[0-9]\{2\}\>' file

正则表达式：{n}和{n，m}忽略最大重复次数

问题描述

2 个解决方案

解决方案1
5 2018-05-23 19:13:44

解决方案2
1 2018-05-23 19:30:55

正则表达式：{n}和{n，m}忽略最大重复次数

问题描述

2 个解决方案

解决方案1 5 2018-05-23 19:13:44

解决方案2 1 2018-05-23 19:30:55

解决方案1
5 2018-05-23 19:13:44

解决方案2
1 2018-05-23 19:30:55