I have a question regarding .NET regular expressions and how it defines matches. I am writing:
var regex = new Regex("<tr><td>1</td><td>(.+)</td><td>(.+)</td>");
if (regex.IsMatch(str))
{
var groups = regex.Match(str).Groups;
var matches = new List<string>();
for (int i = 1; i < groups.Count; i++)
matches.Add(groups[i].Value);
return matches;
}
What I want is get the content of the two following tags. Instead it returns:
[0]: Cell 1</td><td>Cell 2</td>... [1]: Last row of the table
Why is the first match taking </td> and the rest of the string instead of stopping at </td>?
Your regular expression includes
(.+)
which is a greedy match. Greedy matches extend as far as they can before matching the next character ( <
in your case). Try:
(.+?)
This is a non-greedy match which extends as little as possible before matching the next character.
You need to specify lazy matching. Instead of +
, use +?
to say that as few chars as possible should match.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.