简体   繁体   中英

Create list object from “_ElementUnicodeResult object of lxml.etree module”

I'm new to Python and hope to scrape real estate data from a listings website. I've succeeded in pulling text from the page, but the returned object is not what I expected.


# import modules
from lxml import html
import requests

# specify webpage to scrape
url = 'https://www.mlslistings.com/Search/Result/e1fdabc8-9b53-470f-9728-b6ab1a5d1204/1'
page = requests.get(url)
tree = html.fromstring(page.content)

# scrape desired information
address_raw = tree.xpath('//a[@class="search-nav-link"]//text()')
price_raw = tree.xpath('//span[@class="font-weight-bold listing-price d-block pull-left pr-25"]//text()')

As expected, the objects address_raw and price_raw are lists. But the values contained within this list are not strings with the obtained addresses and prices immediately visible. Instead, they all say [_ElementUnicodeResult object of lxml.etree module] . Typing the object name (eg, address_raw ) into the interpreter shows the addresses in the list, as does print(address_raw) . How can I create a simple list of addresses and prices as strings, without the list values showing up as [_ElementUnicodeResult object of lxml.etree module] ?

You can use str() to cast an object to a string and map() to apply the function to each element of the list:

from lxml import html
import requests

url = 'https://www.mlslistings.com/Search/Result/e1fdabc8-9b53-470f-9728-b6ab1a5d1204/1'
page = requests.get(url)
tree = html.fromstring(page.content)

address_raw = list(map(str, tree.xpath('//a[@class="search-nav-link"]//text()')))
price_raw = list(map(str, tree.xpath('//span[@class="font-weight-bold listing-price d-block pull-left pr-25"]//text()')))
print(type(address_raw[0])) # => <class 'str'>
print(type(price_raw[0]))   # => <class 'str'>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM