In C#, I have the following Regex pattern (on an HTML string):
Regex TR = new Regex(@"<tr class=""(\w+)"" rel=""(\w+)"">(.+)</tr>");
The problem is, that when I run it, the match includes everything until the last </tr>
occurrence in the HTML code. There are many <tr>
tags in the code, so the (.+)
pattern includes them and stops only in the last occurrence of </tr>
.
I've tried using (\\w+)
instead, but it doesn't get certain characters inside the tags.
So how can I make this pattern stop at the first </tr>
, and not go until the last one in the code?
The following Regex pattern will stop at the first </tr>
tag:
<tr(\s+)class(\s*)=(\s*)"[^"]*"(\s+)rel(\s*)=(\s*)"[^"]*"(\s*)>(.(?!<\/tr>))*[\s\S]<\/tr>
You can change your code into following to get what you wanted:
Regex TR = new Regex(@"<tr class=""(\w+)"" rel=""(\w+)"">(.(?!<\/tr>))*[\s\S]</tr>");
(?!ABC)
is called negative lookahead . It specifies a group that can not match after the main expression (if it matches, the result is discarded).
For future reference: Try using RegExr to create and test your regex patterns.
> So how can I make this pattern stop at the first </tr>
The most effective capturing process paradigm is to not consume blindly, but consume what is known.
Since the text to grab falls within the anchors of >
and <
, why not use that logic of the ending anchor, the <
, to give the regex parser a hint?
By using the ^
character ( it is the not in a set ) in a set [ ]
we effectively tell the parser to consume until a specific set of character(s) is hit.
In your case change
>(.+)</tr>
to [^<]+
which says consume everything until (or except for) when the <
character is hit, one or more times:
>([^<]+)</tr>
The use of the [^ ]
set is a powerful one which I use in 90% of my regex patterns instead of blinding consuming with .+
or the even more side affect prone .*
.
Also to make your pattern easier to handle use \\x22
in lieu of "
so you are not fighting with the C# parser before the regex parser.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.