I'm trying to figure out how to use Beautiful Soup and am having a hard time.
My HTML page has several elements that look like this:
<a class="propertyName" href="/preferredguest/property/overview/index.html?propertyID=1023"><span>The Westin Peachtree Plaza, Atlanta
</span></a>
<a class="propertyName" href="/preferredguest/property/overview/index.html?propertyID=1144"><span>Sheraton Atlanta Hotel
</span></a>
I'm trying to create an array with the hotel names. Here is my code so far:
import requests
from bs4 import BeautifulSoup
url = "removed"
response = requests.get(url)
soup = BeautifulSoup(response.text)
hotels = soup.find_all('a', class_="propertyName")
But I cannot figure out how to iterate over the hotels array to display the span element.
Your "hotel" name are inside a span
. One way is using the .select()
method
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''<a class="propertyName" href="/preferredguest/property/overview/index.html?propertyID=1023"><span>The Westin Peachtree Plaza, Atlanta
... </span></a>
...
... <a class="propertyName" href="/preferredguest/property/overview/index.html?propertyID=1144"><span>Sheraton Atlanta Hotel
... </span></a>
... ''', 'lxml')
>>> [element.get_text(strip=True) for element in soup.select('a.propertyName > span')]
['The Westin Peachtree Plaza, Atlanta', 'Sheraton Atlanta Hotel']
>>>
or
>>> names = []
>>> for el in hotels:
... names.append(el.find('span').get_text(strip=True))
...
>>> names
['The Westin Peachtree Plaza, Atlanta', 'Sheraton Atlanta Hotel']
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.