[英]Why I can log in amazon website using python mechanize, but not requests or urllib2
[英]Can we reload a page/url in python using urllib or urllib2 or requests or mechanize?
我正在嘗試打開頁面/鏈接並捕獲其中的內容。 它有時會給我所需的內容,有時會拋出錯誤。 我看到如果我刷新頁面幾次-我得到了內容。
因此,我想重新加載頁面並捕獲它。
這是我的偽代碼:
attempts = 0
while attempts:
try:
open_page = urllib2.Request(www.xyz.com)
# Or I think we can also do urllib2.urlopen(www.xyz.com)
break
except:
# here I want to refresh/reload the page
attempts += 1
我的問題是:
1.如何使用urllib或urllib2或請求或機械化重新加載頁面?
2.我們可以循環嘗試捕捉嗎?
謝謝!
如果while attempts
次數等於0的while attempts
,則永遠不會啟動循環。 我會倒退,初始化attempts
等於您希望的重新加載次數:
attempts = 10
while attempts:
try:
open_page = urllib2.Request('www.xyz.com')
except:
attempts -= 1
else:
attempts = False
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
attempts = 10
retries = Retry(total=attempts,
backoff_factor=0.1,
status_forcelist=[ 500, 502, 503, 504 ])
sess = requests.Session()
sess.mount('http://', HTTPAdapter(max_retries=retries ))
sess.mount('https://', HTTPAdapter(max_retries=retries))
sess.get('http://www.google.co.nz/')
引發某些異常或http響應狀態碼不是200后,follow函數可以刷新。
def retrieve(url):
while 1:
try:
response = requests.get(url)
if response.ok:
return response
else:
print(response.status)
time.sleep(3)
continue
except:
print(traceback.format_exc())
time.sleep(3)
continue
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.