简体   繁体   中英

Selecting an XPath child node without the parent namespace

I do not work with XML often and have never used XPath. I am trying to parse an xml document using python/lxml, using XPath. lxml relies on libxml2, and thus I do not have access to XPath 2.0 features. I am trying to do this using a list of XPaths provided by a client that have no namespaces included.

These are for a RETS server response from the Canadian Real Estate Association, if that helps. Their documentation is here: http://www.crea.ca/wp-content/uploads/2016/02/DDFDataFeedTechnicalDoc-2016-3.pdf

The paths are formatted like the following (there are many more of them):

Building/SizeInterior
Land/SizeTotal

The parent element has the namespace " urn:CREA.Search.Property " as seen in the following example response:

<?xml version="1.0" encoding="UTF-8"?>
<RETS ReplyCode="0" ReplyText="Operation successful">
   <COUNT Records="1" />
   <RETS-RESPONSE xmlns="urn:CREA.Search.Property">
      <Pagination>
         <TotalRecords>1</TotalRecords>
         <Limit>100</Limit>
         <Offset>1</Offset>
         <TotalPages>1</TotalPages>
         <RecordsReturned>1</RecordsReturned>
      </Pagination>
      <PropertyDetails ID="XXXXXXXXXX" LastUpdated="Sun, 12 Jun 2016 14:21:20 GMT">
         <Building>
            <SizeInterior />
            <Type>No Building</Type>
            <UtilityWater>Private Utility</UtilityWater>
         </Building>
         <Land>
            <SizeTotal>0.28 ac|under 1 acre</SizeTotal>
            <SizeTotalText>0.28 ac|under 1 acre</SizeTotalText>
            <AccessType>Easy access</AccessType>
            <Acreage>false</Acreage>
            <SizeIrregular>0.28</SizeIrregular>
         </Land>
      </PropertyDetails>
   </RETS-RESPONSE>
</RETS>

I need to be able to grab those elements without having to modify the XPaths if possible.

What I've found so far seems to suggest that even if the namespace is only explicitly specified on a parent element, I need to specify if for every child in the path, rendering the paths provided by my client only usable if I process them to include the namespace before each element.

Is that correct or is there a way that would be cleaner? This strikes me as messy: if the children don't have a namespace explicitly assigned to them, why would the XPath have to be explicit about it?

I assume I'm missing something.

You haven't said much about your technology constraints. If you are able to use an XPath 2.0 processor, then you should be able to define the "default namespace for elements and types" as urn:CREA.Search.Property , and paths using unprefixed names like Building/SizeInterior then treat the element names as being in this namespace.

(The reason that XPath doesn't treat n:aaa/bbb as meaning n:aaa/n:bbb is that it's quite legitimate to have a no-namespace element bbb as a child of a namespaced element n:aaa ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM