Regex matching on to extract multi-line text regions (C#)

Question

I'm looking to capture text regions in a large text block, created in the following format:

...
[region:region-name]
multi line
text block
[/region]
...
[region:another-region-name]
more
multi-line text
[/region]

I have this almost worked out with

\[region:(?'link'.*)\](?'text'(.|[\r\n])*)\[/region\]

This works if I only had one region in the entire text. But, when there are multiple, this gives me just one block with every other 'region' included in the 'text' of that one. I have a feeling that this is to be solved using a negative look ahead, but being a non-pro with regex, I don't know how to modify the above to do it right. Can someone help?

Answer 1

You can do this without lookahead:

\[region:(?'link'.*)\](?'text'(?s).*?)\[/region\]

The additional ? makes the * quantifier lazy, so it will match as few characters as possible. And the (?s) allows the dot to match newlines after this position, so you don't have to use the (.|[\\r\\n]) construction (an alternative would be [\\s\\S] ).

Answer 2

You don't need a negative lookahead, just need to change (?'text'(.|[\\r\\n])*) to be "non-greedy", so that it will match the first instance of [/region] rather than the last. You can do this by adding ? after * , so the resulting pattern would be:

\[region:(?'link'.*)\](?'text'(.|[\r\n])*?)\[/region\]

Regex matching on to extract multi-line text regions (C#)

Question

2 answers

solution1
4 ACCPTED 2011-02-15 18:02:53

solution2
1 2011-02-15 18:02:44

Regex matching on to extract multi-line text regions (C#)

Question

2 answers

solution1 4 ACCPTED 2011-02-15 18:02:53

solution2 1 2011-02-15 18:02:44

solution1
4 ACCPTED 2011-02-15 18:02:53

solution2
1 2011-02-15 18:02:44