Scraping a hotel website to retrieve titles and prices. "hotelInfo" is the div
that holds the interesting content.
It makes sense to me that I would want to only perform my operations on this div
. My code is as follows -
from bs4 import BeautifulSoup
import requests
response = requests.get("http://$hotelurlhere.com")
soup = BeautifulSoup(response.text)
hotelInfo = soup.select('div.hotel-wrap')
hotelTitle = soup.find_all('h3', attrs={'class': 'p-name'})
hotelNameList = []
hotelPriceList = []
for hotel in hotelInfo:
for title in hotelTitle:
hotelNameList.append(title.text)
It makes more sense to say that hotelTitle should be a Beautifulsoup search on hotelInfo above. However when I try this
hotelTitle = hotelInfo.find_all('h3', attrs={'class': 'p-name'})
Error message:
Traceback (most recent call last):
File "main.py", line 8, in <module>
hotelTitle = hotelInfo.find_all('h3', attrs={'class': 'p-name'})
AttributeError: 'list' object has no attribute 'find_all'
An error was returned which was related to the list element not having an attribute of "find_all". I understand that this is because hotelInfo is a list element that was returned. I've searched for information on the correct way to check for the h3
info within this list but I am not having any success.
What is the best way to do this? Shouldn't I be able to set hoteTitle to hotelInfo.find_all rather than just soup.find_all?
As the error message clearly suggests, there is no find_all()
method which you can invoke in a list
object. In this case, you should call find_all()
on individual member of the list
instead, assuming that you need some information from the div.hotel-wrap
as well as the corresponding h3
:
for hotel in hotelInfo:
hotelTitle = hotel.find_all('h3', attrs={'class': 'p-name'})
If you only need the h3
elements, you can combine the two selectors to get them directly without having to find hotelInfo
first :
hotelTitle = soup.select('div.hotel-wrap h3.p-name')
For hotelinfo ,hoteltitle in zip (hotelinfos,hoteltitles): Data={ 'hotelinfo':hotelinfo.get_text(), } Print(data)
Like that
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.