简体   繁体   English

HTMLAgility Pack找不到PageMap标记

[英]HTMLAgility Pack cannot find PageMap tag

I am using HTMLAgility pack to get information from HTML pages about articles. 我正在使用HTMLAgility包从HTML页面获取有关文章的信息。 I am able to find whatever I want throughout the document but for some reason I cannot find the PageMap object no matter what I do. 我可以在整个文档中找到所需的内容,但是由于某种原因,无论我做什么,都找不到PageMap对象。 I created a test document to isolate just the PageMap and still no luck. 我创建了一个测试文档来仅隔离PageMap,但仍然没有运气。

This is the test HTML: 这是测试HTML:

<html>
    <head>

        <PageMap>
            <DataObject type="document">
                <Attribute name="article_title">Test Title</Attribute>
                <Attribute name="article_publication_name">Test Publication Name</Attribute>
                <Attribute name="article_author">Test Authro | The Test</Attribute>
                <Attribute name="article_description">A test of test and test test test!</Attribute>
                <Attribute name="image_src">http://www.google.com</Attribute>
                <Attribute name="article_comments">0</Attribute>
                <Attribute name="article_date_original">10/31/2015</Attribute>
                <Attribute name="article_date_updated">10/31/2015</Attribute>
            </DataObject>
        </PageMap>


    </head>
    <body>
        test
    </body>
</html>

This is the code I am using: 这是我正在使用的代码:

string strPageHTML = File.ReadAllText(@"test.htm");

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(strPageHTML);

HtmlNode htmnArticle = doc.DocumentNode.SelectSingleNode("//PageMap");
tbMessagePreview.Text = htmnArticle.InnerHtml;

The live or test HTML both load fine but htmnArticle node is always null. 实时HTML或测试HTML均可正常加载,但htmnArticle节点始终为null。 Any suggestions would be appreciated. 任何建议,将不胜感激。

Use //pagemap (HtmlAgilityPack normalizes nodes to lower case - HTML Agility Pack Parsing With Upper & Lower Case Tags? ): 使用//pagemap (HtmlAgilityPack将节点规范化为小写- 带有大写和小写标记的HTML Agility Pack解析? ):

HtmlNode htmnArticle = doc.DocumentNode.SelectSingleNode("//pagemap");
tbMessagePreview.Text = htmnArticle.InnerHtml;

Side note: looking at doc.DocumentNode.InnerHtml helps to see how nodes are normalized. 旁注:查看doc.DocumentNode.InnerHtml有助于查看节点如何规范化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM