简体   繁体   中英

Find the namespaces from xml with python ElementTree with redeclared namespaces

How do you acquire proper namespace if a namespace definition is re-declared? To elaborate: the prefix is the same but the attribute value is different, thus redeclared.

ie <site:Stack xmlns:site='http://stackoverflow.com'> to <site:Stack xmlns:site='https://math.stackexchange.com/'>

From what I read in XML docs from Microsoft, this is accepted and it's merely re-declaring the namespace . Unless this is unaccepted form, then I can just close this question.

It's a problem because it messes up most dictionary approaches to acquire namespace, but it also messes up the ElementTree.register_namespace(prefix, uri) method from the Standard library. Register_namespace is quite important as it is used to resolve namespace tags as ET parsing gives out tags in clark notation, for example the Header Element, as <{http://schemas.xmlsoap.org/soap/envelope/}:Header/> Using register_namespace resolves the above back to <SOAP-ENV:Header/> . This causes serious issues for me as I am de-serializing XML into a custom class object, and then re-serializing (after some processing/edits) to a well-formed XML file.

ie from Oracle's XML examples. Note how Orders changes to Confirm, but same prefix.

<PO:order xmlns:PO="http://gizmos.com/orders/"> changes to <PO:confirmation xmlns:PO="http://gizmos.com/confirm">

<?xml version="1.0" encoding="utf-8" ?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header/>
<SOAP-ENV:Body>
    <SOAP-ENV:Fault>
        <faultcode>SOAP-ENV:Client</faultcode>
        <faultstring>Message does not have necessary info</faultstring>
        <faultactor>http://gizmos.com/order</faultactor>
            <detail>
                <PO:order xmlns:PO="http://gizmos.com/orders/">
                Quantity element does not have a value
                </PO:order>
                <PO:confirmation xmlns:PO="http://gizmos.com/confirm">
                Incomplete address: no zip code
                </PO:confirmation>
            </detail>
    </SOAP-ENV:Fault>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

As a note: I am aware you can acquire a dictionary with the namespaces using the following. But this will use the most recent declaration of any namespace. Likewise ET.register_namespace will do the same, which is most recent declaration is used.

import xml.etree.ElementTree as ET
my_namespaces = dict([
    node for (_, node) in ET.iterparse('file.xml', events=['start-ns'])
])

This XML is well formed and works. In a scenario where you are only using ElementTree from the standard lib, ElementTree.register_namespace(prefix, uri) cannot be used to resolve namespaces as the dictionary you feed in as arguments will not work as you will have duplicate prefixes with different URI's, which is also invalid for a python dictionary.

There is a solution if xpath is used, which uses unique mapping to the prefixes. So using the oracle xml as an example. P1 maps to PO to http://gizmos.com/orders/ , and P2 maps to PO to http://gizmos.com/confirm/

Without editing the original XML and using only ElementTree, one will have to manually address this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM