.NET Regular Expressions - Shorter match

Question

I have a question regarding .NET regular expressions and how it defines matches. I am writing:

var regex = new Regex("<tr><td>1</td><td>(.+)</td><td>(.+)</td>");
if (regex.IsMatch(str))
{
    var groups = regex.Match(str).Groups;
    var matches = new List<string>();
    for (int i = 1; i < groups.Count; i++)
        matches.Add(groups[i].Value);

    return matches;
}

What I want is get the content of the two following tags. Instead it returns:

 [0]: Cell 1</td><td>Cell 2</td>... [1]: Last row of the table

Why is the first match taking </td> and the rest of the string instead of stopping at </td>?

Answer 1

Your regular expression includes

(.+)

which is a greedy match. Greedy matches extend as far as they can before matching the next character ( < in your case). Try:

(.+?)

This is a non-greedy match which extends as little as possible before matching the next character.

Answer 2

You need to specify lazy matching. Instead of + , use +? to say that as few chars as possible should match.

.NET Regular Expressions - Shorter match

Question

2 answers

solution1
3 ACCPTED 2010-05-28 03:31:21

solution2
1 2010-05-28 03:32:47

.NET Regular Expressions - Shorter match

Question

2 answers

solution1 3 ACCPTED 2010-05-28 03:31:21

solution2 1 2010-05-28 03:32:47

solution1
3 ACCPTED 2010-05-28 03:31:21

solution2
1 2010-05-28 03:32:47