简体   繁体   English

XPath:嵌套方括号是什么意思?

[英]XPath: What do nested square brackets mean?

I'm learning XPath for web scraping and stumbled across these two XPath examples:我正在学习 XPath 用于 web 抓取并偶然发现这两个 XPath 示例:

//div[@class="head"][@id="top"]

and

//div[@class='canvas- graph']//a[@href='/accounting.html'][i[@class='icon-usd']]/following-sibling::h4

I wonder what does the div[@class="head"][@id="top"] mean.我想知道div[@class="head"][@id="top"]是什么意思。 Does it mean that the @id=top property belongs to the div element?这是否意味着@id=top属性属于div元素? Is it the same as //div[@class="head" and @id="top"] ?//div[@class="head" and @id="top"]一样吗?
And what does it mean when square brackets are nested inside another as in the second example?当方括号嵌套在第二个示例中时,这意味着什么? What would the HTML DOM look like for the second xpath expression to match it?与第二个 xpath 表达式匹配的 HTML DOM 会是什么样子?

Square brackets delimit predicates , and predicates filter items †† .方括号分隔谓词 ,谓词过滤项目††

You anticipate two ways in which predicates can be combined:您预计可以通过两种方式组合谓词:

  1. Consecutively : Yes, this is equivalent to logically and ing the predicates.连续:是的,这相当于逻辑地ing 谓词。 So, correct, //div[@class="head"][@id="top"] is equivalent to //div[@class="head" and @id="top"] .所以,正确, //div[@class="head"][@id="top"]等价于//div[@class="head" and @id="top"]

  2. Recursively : Yes, XPath allows predicates within predicates ( nesting , as you observe).递归:是的,XPath 允许谓词中的谓词(如您所见,嵌套)。

    So, a[@href='/accounting.html'][i[@class='icon-usd']] filters those a elements with an @href attribute value equal to '/accounting.html' and a child i element with a @class attribute value equal to 'icon-usd' .所以, a[@href='/accounting.html'][i[@class='icon-usd']]过滤那些a元素的@href属性值等于'/accounting.html'一个子i元素@class属性值等于'icon-usd'

Together these composition mechanisms provide a powerful means of building predicates out of more basic conditions.这些组合机制一起提供了一种从更基本的条件构建谓词的强大方法。


Predicate references: XPath 1.0 . 谓词参考: XPath 1.0 XPath 3.1 . XPath 3.1
†† Node-sets in XPath 1.0; †† XPath 1.0 中的节点集 sequences in XPath 2.0+. XPath 2.0+ 中的序列

The square braces are called a predicate .方括号称为谓词

A predicate filters a node-set with respect to an axis to produce a new node-set.谓词过滤相对于轴的节点集以产生新的节点集。 For each node in the node-set to be filtered, the PredicateExpr is evaluated with that node as the context node, with the number of nodes in the node-set as the context size, and with the proximity position of the node in the node-set with respect to the axis as the context position;对于要过滤的节点集中的每个节点,PredicateExpr 以该节点作为上下文节点,节点集中的节点数作为上下文大小,以及节点中节点的接近度 position 进行评估- 将轴设置为上下文 position; if PredicateExpr evaluates to true for that node, the node is included in the new node-set;如果 PredicateExpr 对该节点的计算结果为真,则该节点包含在新节点集中; otherwise, it is not included.否则,不包括在内。

A PredicateExpr is evaluated by evaluating the Expr and converting the result to a boolean. PredicateExpr 通过评估 Expr 并将结果转换为 boolean 来评估。 If the result is a number, the result will be converted to true if the number is equal to the context position and will be converted to false otherwise;如果结果是一个数字,如果数字等于上下文 position,则结果将被转换为true ,否则将被转换为false if the result is not a number, then the result will be converted as if by a call to the boolean function.如果结果不是数字,则结果将被转换,就像调用boolean function 一样。 Thus a location path para[3] is equivalent to para[position()=3] .因此位置路径para[3]等价于para[position()=3]

Inside of the predicate you test whether a condition is true or false as a means of filtering the set if items selected to the left of the predicate.在谓词内部,您可以测试条件是真还是假,作为过滤集合的一种手段,如果项目选择在谓词的左侧。 Think of it like a SQL WHERE clause.把它想象成 SQL WHERE子句。

You can choose to put multiple test criteria within a single predicate, or you can have multiple predicates.您可以选择将多个测试条件放在一个谓词中,也可以有多个谓词。 There may be some advantage from a tuning perspective or for clarity to choose to have multiple predicates vs using and and multiple tests within a single predicate.从调优的角度来看,或者为了清晰起见,选择在单个谓词中使用多个谓词与使用and和多个测试可能会有一些优势。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM