简体   繁体   English

美丽的汤和Python导致的属性错误

[英]Attribute Error With Beautiful Soup And Python

I had a working piece of code, and then I run it today and it's broken. 我有一段有效的代码,然后我今天运行了它,但它坏了。 I have pulled out the relevant section that is giving me problems. 我已经删除了给我带来麻烦的相关部分。

from bs4 import BeautifulSoup
import requests

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=')

soup = BeautifulSoup(webpage.content) 
links = soup.find("div",{"class":"main row grid-padding"}).find_all("h2",{"class":"node-title"})

for link in links:
    print(link.a["href"]) 

This gives me an error "Attribute Error: 'NoneType' object has no attribute 'find_all'" 这给我一个错误“属性错误:'NoneType'对象没有属性'find_all'”

What precisely is this error telling me? 这个错误告诉我什么呢?

find_all() is a valid command in the beautiful soup documentation. find_all()是漂亮的汤文档中的有效命令。 Looking through the webpage's source code, my path to my desired object seems to make sense. 通过浏览网页的源代码,我通往所需对象的路径似乎很有意义。

I think something must have changed with the website, because I don't see how my code could just stop working. 我认为该网站一定有所更改,因为我看不到我的代码将如何停止工作。 But I don't understand the error message that well... 但是我不太理解错误消息...

Thanks for any help you can give! 谢谢你提供的所有帮助!

The site you are trying to parse doesn't "like" your user agent and returns 403 error,then parser fails since it cannot find the div . 您尝试解析的站点不喜欢您的用户代理,并返回403错误,然后解析器失败,因为找不到div Try to change user-agent to an user-agent of one of the browsers: 尝试将用户代理更改为浏览器之一的用户代理:

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=', headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'})

This is because when you tried to access the page, it gives you permission denied , so the soup.find() returns nothing None , and None has no attribute of find_all() , this gives you an AttributeError . 这是因为,当您尝试访问该页面时,它会给您permission denied ,因此soup.find() None返回None ,并且None没有find_all()AttributeError ,这会给您AttributeError

from bs4 import BeautifulSoup
import requests

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=')


print webpage.content
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http&#58;&#47;&#47;www&#46;bbcgoodfood&#46;com&#47;search&#47;recipes&#63;" on this server.<P>
Reference&#32;&#35;18&#46;4fa9cd17&#46;1428789762&#46;680369dc
</BODY>
</HTML>

If you resolve this by adding a header with proper user agent like @Vader suggested, your code will then run fine: 如果您通过添加带有适当用户代理(如建议的@Vader)的标头来解决此问题,则代码将正常运行:

...
headers = {'User-agent': 'Mozilla/5.0'}
webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=', headers=headers)

soup = BeautifulSoup(webpage.content) 
links = soup.find("div",{"class":"main row grid-padding"}).find_all("h2",{"class":"node-title"})

for link in links:
    print(link.a["href"])

/recipes/4942/lemon-drizzle-cake
/recipes/3092/ultimate-chocolate-cake
/recipes/3228/chilli-con-carne
/recipes/3229/yummy-scrummy-carrot-cake
/recipes/1223/bestever-brownies
/recipes/1167651/chicken-and-chorizo-jambalaya
/recipes/2089/spiced-carrot-and-lentil-soup
/recipes/1521/summerinwinter-chicken
/recipes/1364/spicy-root-and-lentil-casserole
/recipes/4814/mustardstuffed-chicken
/recipes/4622/classic-scones-with-jam-and-clotted-cream
/recipes/333614/red-lentil-chickpea-and-chilli-soup
/recipes/5605/falafel-burgers
/recipes/11695/raspberry-bakewell-cake
/recipes/4686/chicken-biryani

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM