[英]C# Regex extract content of a div
I've seen some related questions of mine, and I tried them but they don't work. 我已经看到了我的一些相关问题,我尝试了它们但是它们不起作用。 I want to match the content from a div with the id "thumbs".
我想匹配div中的内容和id“thumbs”。 But the regex.Success returns false :(
但是regex.Success返回false :(
Match regex = Regex.Match(html, @"<div[^>]*id=""thumbs"">(.+?)</div>");
Regex is not a good choice for parsing HTML files.. 正则表达式不是解析HTML文件的好选择。
HTML is not strict nor is it regular with its format.. HTML格式不严格,格式也不规则。
Use htmlagilitypack 使用htmlagilitypack
Why use parser? 为什么要使用解析器?
Consider your regex..There are infinite number of cases where you could break your code 考虑你的正则表达式。有无数种情况你可以破坏你的代码
You can use this code to retrieve it using HtmlAgilityPack
您可以使用此代码使用
HtmlAgilityPack
检索它
HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
var itemList = doc.DocumentNode.SelectNodes("//div[@id='thumbs']")//this xpath selects all div with thubs id
.Select(p => p.InnerText)
.ToList();
//itemList now contain all the div tags content having its id as thumbs
No I dont think he needs escapes. 不,我不认为他需要逃脱。 He has @ in front of pattern.
他在模式面前有@。 I think this is correct:
我认为这是正确的:
<div[^>]*id="thumbs">(.+?)</div>
So no double double quotes 所以没有双重双引号
Try this: 尝试这个:
Regex r = new Regex(@"(?<text>(<div\s*?id=(\""|"|&\#34;)"
+ @"thumb(\""|"|&\#34;).*?>)(?>.*?</div>|.*?<div "
+ @"(?>depth)|.*?</div> (?>-depth))*)(?(depth)(?!)).*?</div>",
RegexOptions.Singleline);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.