I'm new to Python and hope to scrape real estate data from a listings website. I've succeeded in pulling text from the page, but the returned object is not what I expected.
# import modules
from lxml import html
import requests
# specify webpage to scrape
url = 'https://www.mlslistings.com/Search/Result/e1fdabc8-9b53-470f-9728-b6ab1a5d1204/1'
page = requests.get(url)
tree = html.fromstring(page.content)
# scrape desired information
address_raw = tree.xpath('//a[@class="search-nav-link"]//text()')
price_raw = tree.xpath('//span[@class="font-weight-bold listing-price d-block pull-left pr-25"]//text()')
As expected, the objects address_raw
and price_raw
are lists. But the values contained within this list are not strings with the obtained addresses and prices immediately visible. Instead, they all say [_ElementUnicodeResult object of lxml.etree module]
. Typing the object name (eg, address_raw
) into the interpreter shows the addresses in the list, as does print(address_raw)
. How can I create a simple list of addresses and prices as strings, without the list values showing up as [_ElementUnicodeResult object of lxml.etree module]
?
You can use str()
to cast an object to a string and map()
to apply the function to each element of the list:
from lxml import html
import requests
url = 'https://www.mlslistings.com/Search/Result/e1fdabc8-9b53-470f-9728-b6ab1a5d1204/1'
page = requests.get(url)
tree = html.fromstring(page.content)
address_raw = list(map(str, tree.xpath('//a[@class="search-nav-link"]//text()')))
price_raw = list(map(str, tree.xpath('//span[@class="font-weight-bold listing-price d-block pull-left pr-25"]//text()')))
print(type(address_raw[0])) # => <class 'str'>
print(type(price_raw[0])) # => <class 'str'>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.