简体   繁体   中英

XPath query result order

For another question I have created some XML related code that works on my development machine but not on viper codepad where I tested it before adding it to my answer.

I could reduce my problem to the point that the order of nodes returned by DOMXPath::query() differs between my system and the codepad.

XML: <test>This is some <span>text</span>, fine.</test>

When I query all textnodes //child::text() the result differs:

Viper Codepad:

#0: This is some 
#1: , fine.
#2: text

My Machine:

#0: This is some 
#1: text
#2: , fine.

I'm not that experienced with xpath that I do understand why this happens and how it's probably possible to influence the return order with the PHP implementation.

Edit:

Further testing has revealed that LIBXML_VERSION differs between the two systems:

Viper Codepad: 20626 (2.6.26; 6 Jun 2006)
My Machine...: 20707 (2.7.7; 15 Mar 2010)

Technically XPath 1.0 returns node-sets rather than node sequences. In the XPath 1.0 specification there is no statement about the order of these node-sets - indeed, being sets, they have no intrinsic order.

However, XSLT 1.0 always processes the node-sets returned by XPath 1.0 in document order, and because of that precedent, there is a widespread expectation that XPath results will be in document order when XPath is invoked from languages other than XSLT. However, there is nothing in the spec to guarantee this. In XPath 2.0 the user expectation becomes part of the spec, and the results of a path expression MUST be in document order.

I could find the following bug-report which looks like the issue: Bug 363252 - proximity position in libxml2's xmlXPathEvalExpression() reported 18 Oct 2006 and confirmed dating back since May 2006 which is before the 2.6.26 version in question.

This should have been fixed in libxml2 2.6.27.

It looks like an bug in 20626 version:

It process first all child text nodes in document order, then content of child element nodes. Should be as result on your machine

XPath is a query language, thus it should only read the structure of the .xml document as is and never modify it. This includes the node order. In your first example however this is not true. So this is definitely a bug according to this .

It appears that Viper Codepad is not returning the selected text() nodes in depth first document order, but doing a breadth first evaluation.

It is supposed to be a depth first traversal.

Saxon, MSXML, Altova XML each returned the results in a depth-first order.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM