简体   繁体   中英

Regex Matches in C# get groups of strings that not contains a pattern

I'm trying to get collection of string subsets from a string, in this example pairs of <tags></tags> Given the string:

<tag>abc</tag><tag>123</tag>

I want 2 groups: <tag>abc</tag> and <tag>123</tag>

That's easy as <tag>.*?</tag> pattern.

Example

But I would like it to be more precise.

Given the string:

<tag>abc</tag><tag><tag>123</tag>

I would it to omit the second <tag> in the middle (because I'm searching for open and closing tags).

I want this result:

<tag>abc</tag>
<tag>123</tag>

I've tried to create a lookahead or lookbehind but no luck (I'm sure I'm using it wrong):

<tag>.*?(?<!<tag>)</tag>

I assume the <tag> and </tag> are used as an example as leading/trailing delimiters.

Note that the lazy dot matching will still match from the first leading delimiter till the first occurrence of the trailing delimiter including any occurrences of the leading one.

To work around it, use a tempered greedy token :

<tag>(?:(?!</?tag>).)*</tag>

See the regex demo

Since the lookahead is executed at each position, this construct is rather resource consuming. You can unroll it as

<tag>[^<]*(?:<(?!/?tag>)[^<]*)*</tag>

See another regex demo .

这个允许只获得文本和数字:

<tag>(.[a-zA-Z\d]*)</tag>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM