简体   繁体   English

将正则表达式转换为HtmlAgilityPack C#

[英]Regex to HtmlAgilityPack C#

I want to know how to convert my code that uses regex to match website's strings in other that uses the HtmlAgilityPack library. 我想知道如何将使用正则表达式的代码转换为与使用HtmlAgilityPack库的其他网站中的字符串匹配的代码。

Example code: 示例代码:

<div class="element"><div class="title"><a href="" title="A.1">A.1</a></div></div>
<div class="element"><div class="title"><a href="" title="A.2">A.2</a></div></div>

My current code is the following: 我当前的代码如下:

List<string> Cap = new List<string>();
WebClient web = new WebClient();
string url = web.DownloadString("");
MatchCollection cap = Regex.Matches(url, "title=\"(.+?)\">", RegexOptions.Singleline);
foreach (Match m in cap)
lst_Cap.ItemsSource = Cap;

And it works. 而且有效。

I've tried with HtmlAgilityPack: 我已经尝试过使用HtmlAgilityPack:

HtmlDocument Web = web.Load(""); // for example
List<string> Cap = new List<string>();
foreach (HtmlNode node in Web.DocumentNode.SelectNodes("//*[@id=\"content\"]/div/div[3]/div[2]/div[1]/a"))

But it adds only A.1. 但它仅添加A.1。

How can I do? 我能怎么做?

Your regex "title=\\"(.+?)\\">" matches and captures any title attribute, in any tags inside the HTML document. 您的正则表达式"title=\\"(.+?)\\">"匹配并捕获HTML文档内任何标签中的任何title属性。

So, use another code with //*[@title] XPath that gets any element nodes ( * ) that contain a title attribute, and then just iterate through the attribute nodes and once its name is title , add the value to the list: 因此,使用另一个带有//*[@title] XPath的代码,该代码获取包含title属性的任何元素节点( * ),然后仅遍历属性节点,一旦其名称为title ,则将值添加到列表中:

var nodes = Web.DocumentNode.SelectNodes("//*[@title]");
if (nodes != null)
   foreach (var node in nodes)
       foreach (var attribute in node.Attributes)
           if (attribute.Name == "title")

Or using LINQ: 或使用LINQ:

var nodes = Web.DocumentNode.SelectNodes("//*[@title]");
var res = nodes.Where(p => p.HasAttributes)
                 .Select(m => m.GetAttributeValue("title", string.Empty))
                 .Where(l => !string.IsNullOrEmpty(l))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM