简体   繁体   中英

Attribute Error With Beautiful Soup And Python

I had a working piece of code, and then I run it today and it's broken. I have pulled out the relevant section that is giving me problems.

from bs4 import BeautifulSoup
import requests

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=')

soup = BeautifulSoup(webpage.content) 
links = soup.find("div",{"class":"main row grid-padding"}).find_all("h2",{"class":"node-title"})

for link in links:
    print(link.a["href"]) 

This gives me an error "Attribute Error: 'NoneType' object has no attribute 'find_all'"

What precisely is this error telling me?

find_all() is a valid command in the beautiful soup documentation. Looking through the webpage's source code, my path to my desired object seems to make sense.

I think something must have changed with the website, because I don't see how my code could just stop working. But I don't understand the error message that well...

Thanks for any help you can give!

The site you are trying to parse doesn't "like" your user agent and returns 403 error,then parser fails since it cannot find the div . Try to change user-agent to an user-agent of one of the browsers:

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=', headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'})

This is because when you tried to access the page, it gives you permission denied , so the soup.find() returns nothing None , and None has no attribute of find_all() , this gives you an AttributeError .

from bs4 import BeautifulSoup
import requests

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=')


print webpage.content
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http&#58;&#47;&#47;www&#46;bbcgoodfood&#46;com&#47;search&#47;recipes&#63;" on this server.<P>
Reference&#32;&#35;18&#46;4fa9cd17&#46;1428789762&#46;680369dc
</BODY>
</HTML>

If you resolve this by adding a header with proper user agent like @Vader suggested, your code will then run fine:

...
headers = {'User-agent': 'Mozilla/5.0'}
webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=', headers=headers)

soup = BeautifulSoup(webpage.content) 
links = soup.find("div",{"class":"main row grid-padding"}).find_all("h2",{"class":"node-title"})

for link in links:
    print(link.a["href"])

/recipes/4942/lemon-drizzle-cake
/recipes/3092/ultimate-chocolate-cake
/recipes/3228/chilli-con-carne
/recipes/3229/yummy-scrummy-carrot-cake
/recipes/1223/bestever-brownies
/recipes/1167651/chicken-and-chorizo-jambalaya
/recipes/2089/spiced-carrot-and-lentil-soup
/recipes/1521/summerinwinter-chicken
/recipes/1364/spicy-root-and-lentil-casserole
/recipes/4814/mustardstuffed-chicken
/recipes/4622/classic-scones-with-jam-and-clotted-cream
/recipes/333614/red-lentil-chickpea-and-chilli-soup
/recipes/5605/falafel-burgers
/recipes/11695/raspberry-bakewell-cake
/recipes/4686/chicken-biryani

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM