简体   繁体   English

如何使用HtmlAgilityPack检查它是否是404错误页面(页面不存在)

[英]How to check if it is 404 error page(page does not exist) using HtmlAgilityPack

Here i am trying to read urls and getting the images in a page. 在这里,我试图阅读网址并在页面中获取图像。 I need to exclude the page if it is 404 and stop getting the images from a 404 error page. 我需要排除页面,如果它是404并停止从404错误页面获取图像。 How to do it using HtmlAgilityPack? 如何使用HtmlAgilityPack? Here is my code 这是我的代码

var document = new HtmlWeb().Load(completeurl);
var urls = document.DocumentNode.Descendants("img")
          .Select(e => e.GetAttributeValue("src", null))
          .Where(s => !String.IsNullOrEmpty(s)).ToList();

You'll need to register a PostRequestHandler event on the HtmlWeb instance, it will be raised after each downloaded document and you'll get access to the HttpWebResponse object. 您需要在HtmlWeb实例上注册PostRequestHandler事件,它将在每个下载的文档之后引发,您将可以访问HttpWebResponse对象。 It has a property for the StatusCode . 它具有StatusCode的属性。

 HtmlWeb web = new HtmlWeb();
 HttpStatusCode statusCode = HttpStatusCode.OK;
 web.PostRequestHandler += (request, response) =>
 {
     if (response != null)
     {
         statusCode = response.StatusCode;
     }
 }

 var doc = web.Load(completeUrl)
 if (statusCode == HttpStatusCode.OK)
 {
     // received a read document
 }

Looking at the code of the HtmlAgilityPack on GutHub, it's even simpler, HtmlWeb has a property StatusCode which is set with the value: 查看GutHub上HtmlAgilityPack的代码,它甚至更简单, HtmlWeb有一个属性StatusCode ,其值设置为:

var web = new HtmlWeb();
var document = web.Load(completeurl);

if (web.StatusCode == HttpStatusCode.OK)
{
    var urls = document.DocumentNode.Descendants("img")
          .Select(e => e.GetAttributeValue("src", null))
          .Where(s => !String.IsNullOrEmpty(s)).ToList();
}

Update 更新

There has been an update to the AgilityPack API. AgilityPack API已有更新。 The trick is still the same: 诀窍仍然是一样的:

var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;

htmlWeb.PostResponse = (request, response) =>
{
    if (response != null)
    {
        lastStatusCode = response.StatusCode;
    }
};

Be aware of the version you use! 请注意您使用的版本!

I am using HtmlAgilityPack v1.5.1 and there is no PostRequestHandler event. 我正在使用HtmlAgilityPack v1.5.1并且没有PostRequestHandler事件。

In the v1.5.1 one has to use PostResponse field. v1.5.1中,必须使用PostResponse字段。 See example below. 见下面的例子。

var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;

htmlWeb.PostResponse = (request, response) =>
{
    if (response != null)
    {
        lastStatusCode = response.StatusCode;
    }
};

There are not many differences but still they are. 差异不大但仍然存在差异。

Hope this will save some time to someone. 希望这会节省一些时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM