简体   繁体   English

HtmlAgilityPack C#— Selectnodes始终返回Null

[英]HtmlAgilityPack C#— Selectnodes Always returns a Null

This is the xpath text i tried to use along with HtmlAgilityPack C# parser. 这是我尝试与HtmlAgilityPack C#解析器一起使用的xpath文本。

//div[@id = 'sc1']/table/tbody/tr/td/span[@class='blacktxt']

I tried to evaluate the xpath expression with firefox xpath add=on and sucessfully got the required items. 我尝试使用firefox xpath add = on评估xpath表达式,并成功获得了必需的项。 But the c# code returns an Null exception. 但是C#代码返回Null异常。

HtmlAgilityPack.HtmlNodeCollection node = htmldoc.DocumentNode.SelectNodes("//div[@id ='sc1']/table/tbody/tr/td/span[@class='blacktxt']");            
MessageBox.Show(node.ToString());

the node always contains null value... Please help me to find the way to get around this problem... Thank you.. 该节点始终包含空值...请帮助我找到解决此问题的方法...谢谢..

DOM Requires <tbody/> Tags to be Inserted DOM要求插入<tbody/>标签

All common browser extensions for building XPath expressions work on the DOM. 用于构建XPath表达式的所有常用浏览器扩展都可在DOM上运行。 Opposite to the HTML specs, the DOM specs require <tr/> elements to be inside <tbody/> elements, so browsers add such elements if missing. 与HTML规范相反,DOM规范要求<tr/>元素必须包含在<tbody/>元素内,因此浏览器会在缺少这些元素时添加此类元素。 You can easily see the difference if looking at the HTML source using Firebug (or similar developer tools working on the DOM) versus displaying the page source (using wget or similar tools that do not interpret anything if necessary). 如果使用Firebug(或在DOM上使用的类似开发人员工具)查看HTML源代码,而不是显示页面源代码(使用wget或在必要时不解释任何内容的类似工具),则可以轻松看出差异。

The Solution 解决方案

Remove the /tbody axis step, and your XPath expression will probably work. 删除/tbody轴步骤,您的XPath表达式可能会起作用。

//div[@id = 'sc1']/table/tr/td/span[@class='blacktxt']

If you Need to Support Both HTML With and Without <tbody/> Tags 如果需要同时支持带和不带<tbody/>标签的HTML

For a more general solution, you could replace the /tbody axis step by a decendant-or-self step // , but this could jump into "inner tables": 对于更通用的解决方案,您可以将/tbody轴步骤替换为dedecant-or-self步骤// ,但这可能会跳转到“内部表”中:

//div[@id = 'sc1']/table//tr/td/span[@class='blacktxt']

Better would be to use alternative XPath expressions: 最好使用替代的XPath表达式:

//div[@id = 'sc1']/table/tr/td/span[@class='blacktxt'] | //div[@id = 'sc1']/table/tbody/tr/td/span[@class='blacktxt'] 

A cleaner XPath 2.0 only solution would be 更加干净的XPath 2.0解决方案是

//div[@id = 'sc1']/table/(tbody, self::*)/tr/td/span[@class='blacktxt']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM