[英]Regular expressions: Multiline-issue with html
I'm playing around with websites and regular expressions in C#. 我在玩C#中的网站和正则表达式。 I have this situation:
我有这种情况:
<a href="path/to/image">
<img src="thumbnail"></a>
That outlining is how my application gets the content of a given web site. 概述就是我的应用程序如何获取给定网站的内容。 Tabs and breaklines not the same for each row.
每行的制表符和断行符都不相同。
I use gskinner to check the regex (http://gskinner.com/RegExr/) and I have created this regular expression: 我使用gskinner检查正则表达式(http://gskinner.com/RegExr/),并创建了以下正则表达式:
(?i)<a([^>]+)>\W.*</a>
Flags: Multiline 标志:多行
Gskinner shows that the pattern is correct. Gskinner表明该模式是正确的。 But when I put in c# (regEx.Matches(...)) it can not find the matches anymore.
但是当我放入c#(regEx.Matches(...))时,它不再找到匹配项。
Does anyone have any clue how to do this? 有人知道如何执行此操作吗?
Thanks 谢谢
using HtmlAgilityPack and your sample string 使用HtmlAgilityPack和您的示例字符串
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
- --
var href = doc.DocumentNode
.Descendants("a")
.Select(n => n.Attributes["href"].Value)
.FirstOrDefault();
var src = doc.DocumentNode
.Descendants("img")
.Select(n => n.Attributes["src"].Value)
.FirstOrDefault();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.