简体   繁体   English

Python:在本地/在特定元素上使用xpath

[英]Python: Using xpath locally / on a specific element

I'm trying to get the links from a page with xpath. 我正在尝试从具有xpath的页面获取链接。 The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I'll capture links which I don't want. 问题是我只希望表中的链接,但是如果我在整个页面上应用xpath表达式,则会捕获不需要的链接。

For example: 例如:

tree = lxml.html.parse(some_response)
links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")

The problem is that applies the expression to the whole document. 问题是将表达式应用于整个文档。 I located the element I want, for example: 我找到了想要的元素,例如:

tree = lxml.html.parse(some_response)
root = tree.getroot()
table = root[1][5] #for example
links = table.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")

But that seems to be performing the query in the whole document as well, as I still am capturing the links outside of the table. 但这似乎也在整个文档中执行查询,因为我仍在捕获表外的链接。 This page says that "When xpath() is used on an Element, the XPath expression is evaluated against the element (if relative) or against the root tree (if absolute):". 该页面说:“在元素上使用xpath()时,将根据元素(如果是相对的)或根树(如果是绝对的)来评估XPath表达式:”。 So, what I using is an absolute expression and I need to make it relative? 所以,我使用的是一个绝对表达式,我需要使其相对吗? Is that it? 是吗

Basically, how can I go about filtering only elements that exist inside of this table? 基本上,我该如何只过滤该表内部存在的元素?

Your xpath starts with a slash ( / ) and is therefore absolute. 您的xpath以斜杠( / )开头,因此是绝对的。 Add a dot ( . ) in front to make it relative to the current element ie 在前面添加一个点( . )使其相对于当前元素,即

links = table.xpath(".//a[contains(@href, 'http://www.example.com/filter/')]")

Another option would be to ask directly for elements inside your table. 另一种选择是直接询问表中的元素。 For instance: 例如:

tree = lxml.html.parse(some_response)
links = tree.xpath("//table[**criteria**]//a[contains(@href, 'http://www.example.com/filter/')]")

Where **criteria** is necessary if there are many tables in the page. 如果页面中有很多表,则需要**criteria**地方。 Some possible criteria would be to filter based on the table id or class. 一些可能的标准是根据表ID或类进行过滤。 For instance: 例如:

links = tree.xpath("//table[@id='my_table_id']//a[contains(@href, 'http://www.example.com/filter/')]")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM