I have two problems, one of them is a regex

Question

I am updating some code that I didn't write and part of it is a regex as follows:

\[url(?:\s*)\]www\.(.*?)\[/url(?:\s*)\]

I understand that .*? does a non-greedy match of everything in the second register.

What does ?:\\s* in the first and third registers do?

Update: As requested, language is C# on .NET 3.5

Answer 1

The syntax (?:) is a way of putting parentheses around a subexpression without separately extracting that part of the string.

The author wanted to match the (.*?) part in the middle, and didn't want the spaces at the beginning or the end from getting in the way. Now you can use \\1 or $1 (or whatever the appropriate method is in your particular language) to refer to the domain name, instead of the first chunk of spaces at the beginning of the string

Answer 2

?: makes the parentheses non-grouping. In that regex, you'll only pull out one piece of information, $1, which contains the middle (.*?) expression.

Answer 3

What does ?:\\s* in the first and third registers do?

It's matching zero or more whitespace characters, without capturing them.

The regex author intends to allow trailing whitespace in the square-bracket-tags, matching all DNS labels following the "www." like so:

[url]www.foo.com[/url]     # foo.com
[url  ]www.foo.com[/url  ] # same
[url  ]www.foo.com[/url]   # same
[url]www.foo.com[/url  ]   # same

Note that the regex also matches:

[url]www.[/url]      # empty string!

and fails to match

[url]stackoverflow.com[/url]  # no match, bummer

Answer 4

You may find this Regular Expressions Cheat Sheet very helpful (hopefully). I spent ages trying to learn Regex with no luck. And once I read this cheat-sheet - I immediately understood what I previously failed to learn.

http://krijnhoetmer.nl/stuff/regex/cheat-sheet/

I have two problems, one of them is a regex

Question

4 answers

solution1
9 ACCPTED 2009-08-24 01:16:18

solution2
4 2009-08-24 01:16:46

solution3
2 2009-08-24 02:50:11

solution4
1 2009-08-24 01:17:47

I have two problems, one of them is a regex

Question

4 answers

solution1 9 ACCPTED 2009-08-24 01:16:18

solution2 4 2009-08-24 01:16:46

solution3 2 2009-08-24 02:50:11

solution4 1 2009-08-24 01:17:47

solution1
9 ACCPTED 2009-08-24 01:16:18

solution2
4 2009-08-24 01:16:46

solution3
2 2009-08-24 02:50:11

solution4
1 2009-08-24 01:17:47