简体   繁体   English

有没有办法使用HtmlAgilityPack检测404页面?

[英]Is there a way to detect 404 pages using HtmlAgilityPack?

I am parsing a forum where some threads are already deleted. 我正在解析一些已删除某些线程的论坛。 So opening them still shows a page but with a message that says "Thread no longer exists". 因此,打开它们仍会显示一个页面,但会显示一条消息“线程不再存在”。 Is there a way to query this using the HtmlAgilityPack in a special way? 有没有办法以特殊方式使用HtmlAgilityPack查询?

Or do I have to compare the InnerHtml or something along those lines? 或者我必须比较InnerHtml或那些沿线的东西?

a 404 is not actually being returned. 实际上并没有返回404。 If it was, you could just look at the headers. 如果是,你可以看看标题。

That said, you are getting a 200 response with an error in the html, therefore you will have to parse the html, traverse the DOM, whatever you want to call it and determine if it failed. 也就是说,你在html中得到一个错误的200响应,因此你将不得不解析html,遍历DOM,无论你想要什么,并确定它是否失败。

It appears that there could potentially be several different error messages, so I would try to make your comparison generic by looking for the "notify administrator" link or perhaps the class="blockrow restore" is only used on the error page. 似乎可能会有几个不同的错误消息,所以我会尝试通过查找“通知管理员”链接使您的比较通用,或者可能只在错误页面上使用class =“blockrow restore”。

Hope that helps. 希望有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM