简体   繁体   中英

Beautiful Soup code returning an “AttributeError”

I'm building a webscraper that returns the names of cafes written in the website like this: <h2 class="venue-title" itemprop="name">Prior</h2> However it is returning this error:

"ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()? [Finished in 0.699s]

Here is the code:

from bs4 import BeautifulSoup
import requests

url = 'https://www.broadsheet.com.au/melbourne/guides/best-cafes-thornbury'
response = requests.get(url, timeout=5)

soup_cafe_list = BeautifulSoup(response.content, "html.parser")
type(soup_cafe_list)

cafes = soup_cafe_list.findAll('h2', attrs_={"class":"venue-title"}).text
print(cafes)

I have tried a whole range of things to figure it out. I feel it has something to do with the findAll arg: cafes = soup_cafe_list.findAll('h2', attrs_={"class":"venue-title"}).text because when I run it as cafes = soup_cafe_list.findAll('h2', class_="venue-title") instead, it sort of works expect doesn't return the items cleaned of their html which I believe .text should do?

Another thing I'm noticing in the traceback is that it may be referring to a different directory for BS4? Could this have anything to do with it, I started off using Jupyter and now am on Atom, but may have incorrectly installed bs4:

File "/Users/[xxxxxxxx]/Desktop/Coding/amvpscraper/webscraper.py", line 10, in cafes = soup_cafe_list.findAll('h2', attrs_={"class":"venue-title"}).text File "/Users/[xxxxxxxx]/opt/anaconda3/lib/python3.7/site-packages/bs4/element.py", line 2081, in getattr

Not sure if I am doing something else wrong...

The error indicates that the return value of the findAll method is a list of elements and does not have a text attribute. Save the result in a list ( without.text method ) and replace attrs_ with attrs:

cafes = soup_cafe_list.findAll('h2', attrs={"class":"venue-title"})

and then iterate through list and get the text. You can do that with a list comprehension:

cafes = [el.text for el in cafes]

Edit : List comprehensions simplify a for loop. You could also write:

res_list = []
for el in cafes:
    res_list.append(el.text)

Additionally, you may add some try-except clause or a check for a valid text field within the loop to catch possible elements without a text.

Output:

['Prior',
 'Rat the Cafe',
 'Ampersand Coffee and Food',
 'Umberto Espresso Bar',
 'Brother Alec',
 'Short Round',
 'Jerry Joy',
 'The Old Milk Bar',
 'Little Henri',
 'Northern Soul']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM