简体   繁体   中英

Regular Expression: {n} and {n,m} ignore maximum number of repetition

I have a question about regex's maximum number of repetition: {n} and {n, m}.

$ man grep
...
Repetition
    A regular expression may be followed by one of several repetition operators:
...
    {n}    The preceding item is matched exactly n times.
    {n,}   The preceding item is matched n or more times.
    {,m}   The preceding item is matched at most m times.  This is a GNU extension.
    {n,m}  The preceding item is matched at least n times, but not more than m times.
...

Now consider a test file:

$ cat ./sample.txt
1
12
123
1234

Then grep it for [0-9] (digits) that repeats exactly 2 times:

$ grep "[0-9]\{2\}" ./sample.txt
12
123
1234

? Why did this include 123 and 1234?

Also, I grep the same text file for digits repeating at least 2 times but not more than 3 times:

$ grep "[0-9]\{2,3\}" ./sample.txt
12
123
1234

??? Why does this return "1234"?

An obvious workaround is to use grep and reverse-grep to filter out excessive results. For example,

$ grep "[0-9]\{2,\}" ./sample.txt | grep -v "[0-9]\{4,\}"
12
123

Can anyone help me understand why {n} returns the line that contains the pattern repeating over n times? And why {n,m} returns the pattern repeating over m times??

Unless you anchor your regular expressions, they can match anywhere in a string.

$ grep "[0-9]\\{2\\}" ./sample.txt will match any line that includes 2 digits.

Use ^ to force your expression to start at the beginning of a line and $ to force it to match to the end of a line. eg.

$ grep '^[0-9]\{2\}$' ./sample.txt
# Using single quotes to avoid potential substitution issues. Hat tip to @ghoti

This should only return 12 .

A pattern may be found within a longer text or may follow the same exact pattern. For grep use -o option to see where the regex found a match. Two digits can be found within a number consisted of two digits or in a number with 10-digit long.

The other answer points to two anchors but there is a word boundary token \\b that matches a boundary position if used. This closes both ends. Unfortunately POSIX BRE (grep default's regex flavor) doesn't support this but in GNU sed you can enable Perl regular expressions and test it:

grep -P '\b[0-9]{2}\b' file

with grep alone two \\< and \\> matches the same position:

grep '\<[0-9]\{2\}\>' file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM