简体   繁体   English

urllib.error.HTTPError:HTTP 错误 404:从 Metacritic 抓取数据时未找到 Python

[英]urllib.error.HTTPError: HTTP Error 404: Not Found Python while scraping data from Metacritic

I'm trying to scrape movie ratings from Metacritic.我正在尝试从 Metacritic 中获取电影评分。 Here's the part of the code which is throwing an error.这是抛出错误的代码部分。

text = text.replace("_","-")
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':user_agent,} 
URL = "http://metacritic.com/" + text
request=urllib.request.Request(URL,None,headers)
try:
    response = urllib.request.urlopen(request)
    data = response.read()
    soup = BeautifulSoup(data,'html.parser')
    metacritic_rating = "Metascore: " + soup.find("span",class_="metascore_w").get_text()
    send_message(metacritic_rating,chat) 
except:
    pass

I modified what I had written using this: https://stackoverflow.com/a/42441391/8618880我用这个修改了我写的内容: https : //stackoverflow.com/a/42441391/8618880

I cannot use requests.get() because of this: urllib2.HTTPError: HTTP Error 403: Forbidden我不能使用requests.get()因为这个: urllib2.HTTPError: HTTP Error 403: Forbidden

I'm looking for a way to get the status code of the page.我正在寻找一种获取页面状态代码的方法。 I was able to find out a way when I used requests.get() .当我使用requests.get()时,我能够找到一种方法。

I checked out all the answers with the title: urllib.error.HTTPError: HTTP Error 404: Not Found Python but could not find any help.我检查了标题为: urllib.error.HTTPError: HTTP Error 404: Not Found Python但找不到任何帮助的所有答案。

Any help is appreciated.任何帮助表示赞赏。

I think this is what you want:我认为这就是你想要的:

import urllib


user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':user_agent,} 
URL = "http://metacritic.com/" + text
request=urllib.request.Request(URL,None,headers)

try:
    response = urllib.request.urlopen(request)
    data = response.read()
    soup = BeautifulSoup(data,'html.parser')
    metacritic_rating = "Metascore: " + soup.find("span",class_="metascore_w").get_text()
    send_message(metacritic_rating,chat) 
except urllib.error.HTTPError as err:
    #print(err.code)
    if err.code == 403:
        <do something>
    else:
        pass

Output:输出:

403

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM