简体   繁体   English

python xpath返回空列表 - exilead

[英]python xpath returns empty list - exilead

I'm fairly new to scraping with Python. 我很擅长使用Python。 I am trying to obtain the number of search results from a query on Exilead. 我试图从Exilead上的查询中获取搜索结果的数量。 In this example I would like to get " 586,564 results". 在这个例子中,我想得到“586,564结果”。

This is the code I am running: 这是我正在运行的代码:

r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')

This returns an empty list. 这将返回一个空列表。

I copy-pasted the xPath directly from the elements' page. 我直接从元素页面复制粘贴xPath。
As an alternative, I have tried using Beautiful soup: 作为替代方案,我尝试过使用美味汤:

html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text

which returns a Attribute error: NoneType object does not have attribute text. 返回属性错误:NoneType对象没有属性文本。

When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source. 当我检查html源代码时,我意识到我实际上无法在源代码中找到我要查找的元素(结果数量)。

Does anyone know why this is happening and how this can be resolved? 有谁知道为什么会这样,以及如何解决这个问题? Thanks a lot! 非常感谢!

When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source. 当我检查html源代码时,我意识到我实际上无法在源代码中找到我要查找的元素(结果数量)。

This suggests that the data you're looking for is dynamically generated with javascript. 这表明您正在寻找的数据是使用javascript动态生成的。 You'll need to be able to see the element you're looking for in the html source. 你需要能够在html源代码中看到你正在寻找的元素。

To confirm this being the cause of your error, you could try something really simple like: 要确认这是导致错误的原因,您可以尝试一些非常简单的方法:

html = r.text
soup = BeautifulSoup(html, 'lxml')

*note the 'lxml' above. *注意上面的'lxml'。

And then manually check 'soup' to see if your desired element is there. 然后手动检查'汤',看看你想要的元素是否存在。

I can get that with a css selector combination of small.pull-right to target the tag and the class name of the element. 我可以使用small.pull-right的css选择器组合来定位标记和元素的类名。

from bs4 import BeautifulSoup
import requests
url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
print(soup.select_one('small.pull-right').text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM