I saw this regular expression performed on an url:
$url = 'http://www.domain.com/';
preg_match('/(http)(.*?)\n/', $url, $matches);
I am not sure what the use of the question mark "?" is in this regex expression. According to regex manuals, the "?" is a meta character that is equivalent to {0,1}. Then, what is the point of having "?" after an * since * already represents {0,}
Can someone please enlighten me. Thanks.
It has a different meaning when it follows another quantifier.
In this case it changes the matching behaviour of the preceding quantifier. The default behaviour is greedy and the the ?
changes it to "ungreedy".
"Greedy" means match as much as possible
"Ungreedy" means match as less as possible
See the article on regular-expression.info
For example:
a.+b
will match "aabxb" in aabxb
a.+?b
will match only "aab" in aabxb
See the example here on Regexr
You may be interested in my blog post about this topic: You do know Quantifiers. Really?
About your regex
preg_match('/(http)(.*?)\n/', $url, $matches);
I don't think it makes a difference here. The .
matches anything but newline characters by default (you can change this by adding a s
after the closing regex delimiter), so if the question mark is there or not, it will match only till the first \\n
.
If you change the behaviour by using preg_match('/(http)(.*?)\\n/s', $url, $matches);
, it will make a difference. .*\\n
would match till the last \\n
and .*?\\n
will stop at the first \\n
.
In this case, the question mark means a "stingy" match. It will stop matching as soon as the first \\n
is encountered, while otherwise, it would gobble up intervening \\n
s until the last.
More about greedy and stingy matching at http://www.perl.com/doc/FMTEYEWTK/regexps.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.