使用XPath和HtmlAgilityPack获得HTML文档节点的最快方法是什么？

Question

在我的应用程序中，我需要获取博客文章图像的URL。 为此，我使用了HtmlAgilityPack。

这是我到目前为止的代码：

static string GetBlogImageUrl(string postUrl)
{
    string imageUrl = string.Empty;

    using (WebClient client = new WebClient())
    {
        string htmlString = client.DownloadString(postUrl);
        HtmlDocument htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(htmlString);
        string xPath = "/html/body/div[contains(@class, 'container')]/div[contains(@class, 'content_border')]/div[contains(@class, 'single-post')]/main[contains(@class, 'site-main')]/article/header/div[contains(@class, 'featured_image')]/img";
        HtmlNode node = htmlDocument.DocumentNode.SelectSingleNode(xPath);
        imageUrl = node.GetAttributeValue("src", string.Empty);
    }

    return imageUrl;
}

问题是这太慢了，当我进行一些测试时，我注意到提取给定页面中图像的URL大约需要三秒钟。 当我加载提要并尝试添加多篇文章时，这是一个问题。

我尝试使用要加载的元素的绝对xpath，但没有发现任何改进。 有没有更快的方法来实现这一目标？

Answer 1

您可以尝试这段代码，看看它是否更快？

string Url = "http://blog.cedrotech.com/5-tendencias-mobile-que-sua-empresa-precisa-acompanhar/"; HtmlWeb web = new HtmlWeb(); HtmlDocument doc = web.Load(Url); var featureDiv = doc.DocumentNode.Descendants("div").FirstOrDefault(_ => _.Attributes.Contains("class") && _.Attributes["class"].Value.Contains("featured_image")); var img = featureDiv.ChildNodes.First(_ => _.Name.Equals("img")); var imgUrl = img.Attributes["src"];

使用XPath和HtmlAgilityPack获得HTML文档节点的最快方法是什么？

问题描述

1 个解决方案

解决方案1
0 2017-01-24 19:37:45

使用XPath和HtmlAgilityPack获得HTML文档节点的最快方法是什么？

问题描述

1 个解决方案

解决方案1 0 2017-01-24 19:37:45

解决方案1
0 2017-01-24 19:37:45