简体   繁体   English

正则表达式:带有HTML的多行问题

[英]Regular expressions: Multiline-issue with html

I'm playing around with websites and regular expressions in C#. 我在玩C#中的网站和正则表达式。 I have this situation: 我有这种情况:

             <a href="path/to/image">
    <img src="thumbnail"></a>

That outlining is how my application gets the content of a given web site. 概述就是我的应用程序如何获取给定网站的内容。 Tabs and breaklines not the same for each row. 每行的制表符和断行符都不相同。

I use gskinner to check the regex (http://gskinner.com/RegExr/) and I have created this regular expression: 我使用gskinner检查正则表达式(http://gskinner.com/RegExr/),并创建了以下正则表达式:

            (?i)<a([^>]+)>\W.*</a>

Flags: Multiline 标志:多行

Gskinner shows that the pattern is correct. Gskinner表明该模式是正确的。 But when I put in c# (regEx.Matches(...)) it can not find the matches anymore. 但是当我放入c#(regEx.Matches(...))时,它不再找到匹配项。

Does anyone have any clue how to do this? 有人知道如何执行此操作吗?

Thanks 谢谢

using HtmlAgilityPack and your sample string 使用HtmlAgilityPack和您的示例字符串

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

- --

var href = doc.DocumentNode
    .Descendants("a")
    .Select(n => n.Attributes["href"].Value)
    .FirstOrDefault();

var src = doc.DocumentNode
    .Descendants("img")
    .Select(n => n.Attributes["src"].Value)
    .FirstOrDefault();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM