简体   繁体   中英

How to get the prefix part of XML namespace in python?

I have the following XML (in brief):

<?xml version="1.0" encoding="iso-8859-1"?>
<SOAP-ENV:Envelope 
    xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <SOAP-ENV:Body>
        <Proposal 
            xmlns="http://www.opengis.net/AAA"  
            xmlns:apd="http://www.opengis.net/BBB" 
            xmlns:common="http://www.opengis.net/DDD"  
            xmlns:core="http://www.opengis.net/EEE" 
            xmlns:pdt="http://www.opengis.net/CCC" 
            xmlns:xlink="http://www.opengis.net/FFF" 
            xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            
            <SchemaVersion>1.3</SchemaVersion>
            <ApplicationHeader>
                <ApplicationTo>W1234                         </ApplicationTo>
                <DateSubmitted>2021-04-26</DateSubmitted>
# ...
            <Agent>
                <common:PersonName>
                    <pdt:PersonNameTitle>Mr </pdt:PersonNameTitle>
                    <pdt:PersonGivenName>Holmes</pdt:PersonGivenName>
                    <pdt:PersonFamilyName>Sherlock</pdt:PersonFamilyName>
                </common:PersonName>
                <common:OrgName>Bad Company LTD</common:OrgName>
        </Proposal>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

I am trying to extract the XML tags and the prefix part of the namespace . (Basically to the way it looks in the XML.) Using Python 3.10.3 , I have tried many variations of the following.


from lxml import html, etree
...

def list_xml_tags(xml_blob):
    print(' [INFO] Printing all XML tags:')
    
    xml = etree.fromstring(bytes(xml_blob, encoding='utf-8'))  
    root = etree.Element("root")
    
    print('Root TAG: {}'.format(root))
    print('nsmap : {}'.format(root.nsmap))
    print('\nDescendants:')
    
    for el in xml.iter():
        el.tag = el.xpath('local-name()')
        #ns = el.xpath('namespace-uri()')
        #ns = etree.QName(el).namespace
        #ns = root.nsmap
        ns = etree.QName(el).namespace
        if el.attrib == None: el.attrib =''
        print('{} : {}  : {}'.format(ns, el.tag, el.attrib))

However, this is not working. I am not able to get the namespace at all using this. The only thing that comes out is None . (Also not sure why the root tag is shown as an address.)

 [INFO] Printing all XML tags:
 ------------------------------------------------------------
Root TAG: <Element root at 0x16a85fc2340>
nsmap : {}

Descendants:
None : Envelope  : {}
None : Body  : {}
None : Proposal  : {}
None : SchemaVersion  : {}
...

Q: How can I get the following output?

SOAP-ENV : Envelope
pdt      : PersonGivenName
common   : OrgName
...

etc.

With python3

from lxml import etree                                  
doc = etree.parse('tmp.xml')
# namespace reverse lookup dict
ns = { value:(key if key is not None else 'default') for (key,value) in set(doc.xpath('//*/namespace::*'))}
for ele in doc.iter():
    qn = etree.QName(ele)
    print(f"{ns[qn.namespace]:>30} : {qn.localname}")

Result:
Those with default prefix belong to the default namespace without prefix xmlns="http://www.opengis.net/AAA"

       SOAP-ENV : Envelope
       SOAP-ENV : Body
        default : Proposal
        default : SchemaVersion
        default : ApplicationHeader
        default : ApplicationTo
        default : DateSubmitted
        default : Agent
         common : PersonName
            pdt : PersonNameTitle
            pdt : PersonGivenName
            pdt : PersonFamilyName
         common : OrgName

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM