[英]urllib.error.HTTPError: HTTP Error 404: Not Found Python while scraping data from Metacritic
I'm trying to scrape movie ratings from Metacritic.我正在尝试从 Metacritic 中获取电影评分。 Here's the part of the code which is throwing an error.
这是抛出错误的代码部分。
text = text.replace("_","-")
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':user_agent,}
URL = "http://metacritic.com/" + text
request=urllib.request.Request(URL,None,headers)
try:
response = urllib.request.urlopen(request)
data = response.read()
soup = BeautifulSoup(data,'html.parser')
metacritic_rating = "Metascore: " + soup.find("span",class_="metascore_w").get_text()
send_message(metacritic_rating,chat)
except:
pass
I modified what I had written using this: https://stackoverflow.com/a/42441391/8618880我用这个修改了我写的内容: https : //stackoverflow.com/a/42441391/8618880
I cannot use requests.get()
because of this: urllib2.HTTPError: HTTP Error 403: Forbidden我不能使用
requests.get()
因为这个: urllib2.HTTPError: HTTP Error 403: Forbidden
I'm looking for a way to get the status code of the page.我正在寻找一种获取页面状态代码的方法。 I was able to find out a way when I used
requests.get()
.当我使用
requests.get()
时,我能够找到一种方法。
I checked out all the answers with the title: urllib.error.HTTPError: HTTP Error 404: Not Found Python
but could not find any help.我检查了标题为:
urllib.error.HTTPError: HTTP Error 404: Not Found Python
但找不到任何帮助的所有答案。
Any help is appreciated.任何帮助表示赞赏。
I think this is what you want:我认为这就是你想要的:
import urllib
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':user_agent,}
URL = "http://metacritic.com/" + text
request=urllib.request.Request(URL,None,headers)
try:
response = urllib.request.urlopen(request)
data = response.read()
soup = BeautifulSoup(data,'html.parser')
metacritic_rating = "Metascore: " + soup.find("span",class_="metascore_w").get_text()
send_message(metacritic_rating,chat)
except urllib.error.HTTPError as err:
#print(err.code)
if err.code == 403:
<do something>
else:
pass
Output:输出:
403
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.