Unexpected Result from PHP DOMXPath query

Question

I've this structure of an xml-document:

<realestates:realEstates xmlns:ns2="http://rest.immobilienscout24.de/schema/platform/gis/1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:common="http://rest.immobilienscout24.de/schema/common/1.0" xmlns:realestates="http://rest.immobilienscout24.de/schema/offer/realestates/1.0">
  <realEstateList>
    <typeList>
      <realEstateElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:offerlistelement="http://rest.immobilienscout24.de/schema/offer/listelement/1.0">
        <address>
          <postcode>31874</postcode>
        </address>
      </realEstateElement>
    </typeList>
  </realEstateList>
</realestates:realEstates>

Now I want to select all realEstateElement-elemets which are not starting ie with 31 in the postcode because I want to remove them from the document.

I try to select all matching with this xpath-expression

typeList//realEstateElement/address[starts-with(postcode,"31")]

But what I get ist nothing. If I remove typeList at the beginning, I get all matching postcode-elements and not the realEstateElement-elements. Has anybody an idea how I can remove all dismatching elements on a simple way?

Thanks!

Answer 1

This XPath expression:

//realEstateElement/address[starts-with(postcode,"31")]

selects all the address nodes inside each descendant realEstateElement nodes. It's a XPath expresion with two steps. The last step is always the one you are selecting. The previous steps only create the context for the others. Each step can have one or more predicates which have the format [ *boolean expression * ] . Each selected node is compared to its predicate and only the ones that match it are kept in the context or in the final result.

So if you want to select the realEstateElement nodes that match your predicate, you need to have it as your last step. The rest of the path ( address/postcode ) can be used in the predicate which is in the context of the realEstateElement node:

//realEstateElement[starts-with(address/postcode,"31")]

This will return all realEstateElement nodes that contain an address element containing a postcode element with the text contents starting with 31 .

Note 1 : You can add more predicates if you have to restrict the node further:

//realEstateElement[starts-with(address/postcode,"31")][not(starts-with(address‌/postcode, "318"))]

This will select from the realEstateElements that have address/postcode starting with "31", all those which do not start with 318. The predicates are valid in the context created by the previous predicate or step.

Note 2 : The namespaces would be important if you needed to include any one of them in your XPath, which so far doesn't seem to be the case. In case you need to do that, you will have to register a prefix so you can use the selector in an expression:

$xmldoc->registerXPathNamespace('re', 'http://rest.immobilienscout24.de/schema/offer/realestates/1.0');

The prefix doesn't have to match the one declared in the document (which might not exist, if it is a default namespace). With this, you could use expressions like this one:

/re:realEstates/realEstateList/typeList/realEstateElement[starts-with(address/postcode,"31")]

which also selects realEstateElement using an absolute expression.

Unexpected Result from PHP DOMXPath query

Question

1 answers

solution1
0 ACCPTED 2014-06-07 20:18:36

Unexpected Result from PHP DOMXPath query

Question

1 answers

solution1 0 ACCPTED 2014-06-07 20:18:36

solution1
0 ACCPTED 2014-06-07 20:18:36