简体   繁体   English

C#:正则表达式匹配最近

[英]C#: Regex match closest

<table class="listProvision" class="itable">
    <tr>
        <td class="whatever">some infos</td>
        <td>some more infos</td>
        <td>13908402</td>
        <td>hello world</td>
    </tr>
    <tr>
        <td class="whatever">some infos</td>
        <td>some more infos</td>
        <td id="num">13908402</td>
        <td>hello world</td>
    </tr>
</table>

Given the above sample HTML, how can I properly parse all existences of <tr>...</tr> between the table with class listProvision ? 给定以上示例HTML, 我如何正确地解析具有类listProvisiontable之间所有<tr>...</tr>存在

I tried: <table.*?listProvision.*?>(?:.*?<tr.*?>(.*?)</tr>)+.*?</table> , but I can't figure out what's wrong. 我试过了: <table.*?listProvision.*?>(?:.*?<tr.*?>(.*?)</tr>)+.*?</table> ,但是我不知道找出问题所在。 There is never going to be any complicated html pulled into this regex so don't worry about that. 永远不会有任何复杂的html放入此正则表达式中,因此不必担心。

Here is sample how you can parse html string with Html Agility Pack 这是如何使用HTML Agility Pack解析html字符串的示例

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

var rows = doc.DocumentNode
              .SelectNodes("//table[@class='listProvision']/tr");

Then you can use HtmlNode.InnerHtml property to get all data between <tr>...</tr> tags. 然后,您可以使用HtmlNode.InnerHtml属性获取<tr>...</tr>标记之间的所有数据。

1) Use RegexOptions.Singleline to make dot match newline. 1)使用RegexOptions.Singleline 使点与换行符匹配。 (your regex works already, I got it work here with just the single-line flag) (您的正则表达式已经可以使用了,我只使用单行标志就可以在这里使用它了)

2) access match.Groups["yourNamedCaptureGroup"].Captures for your captures. 2) 访问 match.Groups["yourNamedCaptureGroup"].Captures为您的捕获。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM