简体   繁体   中英

python xpath returns empty list - exilead

I'm fairly new to scraping with Python. I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get " 586,564 results".

This is the code I am running:

r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')

This returns an empty list.

I copy-pasted the xPath directly from the elements' page.
As an alternative, I have tried using Beautiful soup:

html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text

which returns a Attribute error: NoneType object does not have attribute text.

When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.

Does anyone know why this is happening and how this can be resolved? Thanks a lot!

When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.

This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.

To confirm this being the cause of your error, you could try something really simple like:

html = r.text
soup = BeautifulSoup(html, 'lxml')

*note the 'lxml' above.

And then manually check 'soup' to see if your desired element is there.

I can get that with a css selector combination of small.pull-right to target the tag and the class name of the element.

from bs4 import BeautifulSoup
import requests
url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
print(soup.select_one('small.pull-right').text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM