正則表達式：{n}和{n，m}忽略最大重復次數

Question

我對正則表達式的最大重復次數有疑問：{n}和{n，m}。

$ man grep
...
Repetition
    A regular expression may be followed by one of several repetition operators:
...
    {n}    The preceding item is matched exactly n times.
    {n,}   The preceding item is matched n or more times.
    {,m}   The preceding item is matched at most m times.  This is a GNU extension.
    {n,m}  The preceding item is matched at least n times, but not more than m times.
...

現在考慮一個測試文件：

$ cat ./sample.txt
1
12
123
1234

然后將其重復[0-9]（數字），精確重復2次：

$ grep "[0-9]\{2\}" ./sample.txt
12
123
1234

？ 為什么這包括123和1234？

另外，我將相同的文本文件grep重復至少2次但不超過3次的數字：

$ grep "[0-9]\{2,3\}" ./sample.txt
12
123
1234

??? 為什么返回“ 1234”？

一個明顯的解決方法是使用grep和reverse-grep過濾掉過多的結果。 例如，

$ grep "[0-9]\{2,\}" ./sample.txt | grep -v "[0-9]\{4,\}"
12
123

誰能幫助我理解為什么{n}返回包含重復n次的模式的行？ 為什么{n，m}返回重復m次的模式？

Answer 1

除非您錨定正則表達式，否則它們可以匹配字符串中的任何位置。

$ grep "[0-9]\\{2\\}" ./sample.txt將匹配任何包含2位數字的行。

使用^強制表達式從行的開頭開始，使用$強制表達式與行的末尾匹配。 例如。

$ grep '^[0-9]\{2\}$' ./sample.txt
# Using single quotes to avoid potential substitution issues. Hat tip to @ghoti

這應該只返回12 。

Answer 2

模式可以在較長的文本中找到，也可以遵循相同的確切模式。 對於grep，請使用-o選項查看正則表達式在哪里找到匹配項。 在由兩位數字組成的數字中或在10位數字長的數字中可以找到兩位數字。

其他答案指向兩個錨點，但是有一個單詞邊界標記\\b與邊界位置匹配（如果使用的話）。 這兩端都關閉。 不幸的是POSIX BRE（grep默認的regex風格）不支持此功能，但是在GNU sed中，您可以啟用Perl正則表達式並對其進行測試：

grep -P '\b[0-9]{2}\b' file

僅使用grep ，兩個\\<和\\>匹配相同位置：

grep '\<[0-9]\{2\}\>' file

正則表達式：{n}和{n，m}忽略最大重復次數

問題描述

2 個解決方案

解決方案1
5 2018-05-23 19:13:44

解決方案2
1 2018-05-23 19:30:55

正則表達式：{n}和{n，m}忽略最大重復次數

問題描述

2 個解決方案

解決方案1 5 2018-05-23 19:13:44

解決方案2 1 2018-05-23 19:30:55

解決方案1
5 2018-05-23 19:13:44

解決方案2
1 2018-05-23 19:30:55