[英]How do I catch a 404 error in urllib? (python 3)
我已经阅读了数十个类似问题的示例,但我无法获得我所见过的任何解决方案或其变体。 我正在抓取屏幕,我只想忽略 404 错误(跳过页面)。 我明白了
'AttributeError: 'module' object 没有属性 'HTTPError'。
我也尝试过“URLError”。 我已经看到几乎相同的语法被接受为有效的答案。 有任何想法吗? 这是我得到的:
import urllib
import datetime
from bs4 import BeautifulSoup
class EarningsAnnouncement:
def __init__(self, Company, Ticker, EPSEst, AnnouncementDate, AnnouncementTime):
self.Company = Company
self.Ticker = Ticker
self.EPSEst = EPSEst
self.AnnouncementDate = AnnouncementDate
self.AnnouncementTime = AnnouncementTime
webBaseStr = 'http://biz.yahoo.com/research/earncal/'
earningsAnnouncements = []
dayVar = datetime.date.today()
for dte in range(1, 30):
currDay = str(dayVar.day)
currMonth = str(dayVar.month)
currYear = str(dayVar.year)
if (len(currDay)==1): currDay = '0' + currDay
if (len(currMonth)==1): currMonth = '0' + currMonth
dateStr = currYear + currMonth + currDay
webString = webBaseStr + dateStr + '.html'
try:
#with urllib.request.urlopen(webString) as url: page = url.read()
page = urllib.request.urlopen(webString).read()
soup = BeautifulSoup(page)
tbls = soup.findAll('table')
tbl6= tbls[6]
rows = tbl6.findAll('tr')
rows = rows[2:len(rows)-1]
for earn in rows:
earningsAnnouncements.append(EarningsAnnouncement(earn.contents[0], earn.contents[1],
earn.contents[3], dateStr, earn.contents[3]))
except urllib.HTTPError as err:
if err.code == 404:
continue
else:
raise
dayVar += datetime.timedelta(days=1)
对于urllib(而不是urllib2)来说,异常是urllib.error.HTTPError
,而不是urllib.HTTPError
。 有关更多信息,请参阅文档 。
做这个:
import urllib.error# import
except urllib.error.URLError as e:# use 'urllib.error.URLError' and not 'urllib.HTTPError'
print ('Error code: ', e.code)# or what ever u want
return e.code
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.