简体   繁体   中英

C#: Regex match closest

<table class="listProvision" class="itable">
    <tr>
        <td class="whatever">some infos</td>
        <td>some more infos</td>
        <td>13908402</td>
        <td>hello world</td>
    </tr>
    <tr>
        <td class="whatever">some infos</td>
        <td>some more infos</td>
        <td id="num">13908402</td>
        <td>hello world</td>
    </tr>
</table>

Given the above sample HTML, how can I properly parse all existences of <tr>...</tr> between the table with class listProvision ?

I tried: <table.*?listProvision.*?>(?:.*?<tr.*?>(.*?)</tr>)+.*?</table> , but I can't figure out what's wrong. There is never going to be any complicated html pulled into this regex so don't worry about that.

Here is sample how you can parse html string with Html Agility Pack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

var rows = doc.DocumentNode
              .SelectNodes("//table[@class='listProvision']/tr");

Then you can use HtmlNode.InnerHtml property to get all data between <tr>...</tr> tags.

1) Use RegexOptions.Singleline to make dot match newline. (your regex works already, I got it work here with just the single-line flag)

2) access match.Groups["yourNamedCaptureGroup"].Captures for your captures.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM