[英]How to check if it is 404 error page(page does not exist) using HtmlAgilityPack
Here i am trying to read urls and getting the images in a page. 在这里,我试图阅读网址并在页面中获取图像。 I need to exclude the page if it is 404 and stop getting the images from a 404 error page. 我需要排除页面,如果它是404并停止从404错误页面获取图像。 How to do it using HtmlAgilityPack? 如何使用HtmlAgilityPack? Here is my code 这是我的代码
var document = new HtmlWeb().Load(completeurl);
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s)).ToList();
You'll need to register a PostRequestHandler
event on the HtmlWeb
instance, it will be raised after each downloaded document and you'll get access to the HttpWebResponse
object. 您需要在HtmlWeb
实例上注册PostRequestHandler
事件,它将在每个下载的文档之后引发,您将可以访问HttpWebResponse
对象。 It has a property for the StatusCode
. 它具有StatusCode
的属性。
HtmlWeb web = new HtmlWeb();
HttpStatusCode statusCode = HttpStatusCode.OK;
web.PostRequestHandler += (request, response) =>
{
if (response != null)
{
statusCode = response.StatusCode;
}
}
var doc = web.Load(completeUrl)
if (statusCode == HttpStatusCode.OK)
{
// received a read document
}
Looking at the code of the HtmlAgilityPack on GutHub, it's even simpler, HtmlWeb
has a property StatusCode
which is set with the value: 查看GutHub上HtmlAgilityPack的代码,它甚至更简单, HtmlWeb
有一个属性StatusCode
,其值设置为:
var web = new HtmlWeb();
var document = web.Load(completeurl);
if (web.StatusCode == HttpStatusCode.OK)
{
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s)).ToList();
}
There has been an update to the AgilityPack API. AgilityPack API已有更新。 The trick is still the same: 诀窍仍然是一样的:
var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;
htmlWeb.PostResponse = (request, response) =>
{
if (response != null)
{
lastStatusCode = response.StatusCode;
}
};
Be aware of the version you use! 请注意您使用的版本!
I am using HtmlAgilityPack v1.5.1
and there is no PostRequestHandler
event. 我正在使用HtmlAgilityPack v1.5.1
并且没有PostRequestHandler
事件。
In the v1.5.1
one has to use PostResponse
field. 在v1.5.1
中,必须使用PostResponse
字段。 See example below. 见下面的例子。
var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;
htmlWeb.PostResponse = (request, response) =>
{
if (response != null)
{
lastStatusCode = response.StatusCode;
}
};
There are not many differences but still they are. 差异不大但仍然存在差异。
Hope this will save some time to someone. 希望这会节省一些时间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.