Python Scraping，網頁不存在但網站重定向到另一個頁面

Question

我試圖找到一種方法來知道網頁是否存在。 有很多方法，如 httlib2、urlparse 和 using requests。 但在我的情況下，如果網頁不存在，網站會將我重定向到主頁，例如https://www.thenews.com.pk/latest/category/sports/2015-09-21

有什么方法可以抓住嗎？

Answer 1

您可以檢查最終url是否是您被重定向到的那個，以及是否有任何重定向的history 。

>>> import requests
>>> target_url = "https://www.thenews.com.pk/latest/category/sports/2015-09-21"
>>> response = requests.get(target_url)
>>> response.history[0].url
u'https://www.thenews.com.pk/latest/category/sports/2015-09-21'
>>> response.url
u'https://www.thenews.com.pk/'
>>> response.history and response.url == 'https://www.thenews.com.pk/' != target_url
True

Answer 2

您提到的 URL 提供了一個您可以捕獲的重定向返回代碼 (307)。 看這里：

$ curl -i https://www.thenews.com.pk/latest/category/sports/2015-09-21
HTTP/1.1 307 Temporary Redirect
Date: Sun, 26 Mar 2017 10:13:39 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=ddcd246615efb68a7c72c73f480ea81971490523219; expires=Mon, 26-Mar-18 10:13:39 GMT; path=/; domain=.thenews.com.pk; HttpOnly
Set-Cookie: bf_session=b02fb5b6cc732dc6c3b60332288d0f1d4f9f7360; expires=Sun, 26-Mar-2017 11:13:39 GMT; Max-Age=3600; path=/; HttpOnly
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: https://www.thenews.com.pk/
X-Cacheable: YES
X-Varnish: 654909723
Age: 0
Via: 1.1 varnish
X-Age: 0
X-Cache: MISS
Access-Control-Allow-Origin: *
Server: cloudflare-nginx
CF-RAY: 345956a8be8a7289-AMS

Python Scraping，網頁不存在但網站重定向到另一個頁面

問題描述

2 個解決方案

解決方案1
0 2017-03-26 10:14:21

解決方案2
0 已采納 2017-03-26 10:15:16

Python Scraping，網頁不存在但​​網站重定向到另一個頁面

問題描述

2 個解決方案

解決方案1 0 2017-03-26 10:14:21

解決方案2 0 已采納 2017-03-26 10:15:16

Python Scraping，網頁不存在但網站重定向到另一個頁面

解決方案1
0 2017-03-26 10:14:21

解決方案2
0 已采納 2017-03-26 10:15:16