简体   繁体   中英

Regex matching on to extract multi-line text regions (C#)

I'm looking to capture text regions in a large text block, created in the following format:

...
[region:region-name]
multi line
text block
[/region]
...
[region:another-region-name]
more
multi-line text
[/region]

I have this almost worked out with

\[region:(?'link'.*)\](?'text'(.|[\r\n])*)\[/region\]

This works if I only had one region in the entire text. But, when there are multiple, this gives me just one block with every other 'region' included in the 'text' of that one. I have a feeling that this is to be solved using a negative look ahead, but being a non-pro with regex, I don't know how to modify the above to do it right. Can someone help?

You can do this without lookahead:

\[region:(?'link'.*)\](?'text'(?s).*?)\[/region\]

The additional ? makes the * quantifier lazy, so it will match as few characters as possible. And the (?s) allows the dot to match newlines after this position, so you don't have to use the (.|[\\r\\n]) construction (an alternative would be [\\s\\S] ).

You don't need a negative lookahead, just need to change (?'text'(.|[\\r\\n])*) to be "non-greedy", so that it will match the first instance of [/region] rather than the last. You can do this by adding ? after * , so the resulting pattern would be:

\[region:(?'link'.*)\](?'text'(.|[\r\n])*?)\[/region\]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM