简体   繁体   中英

Parsing xml in python to get all child elements

I have parsed an XML file to get all its elements. I am getting the following output

[<Element '{urn:mitel:params:xml:ns:yang:vld}vld-list' at 0x0000000003059188>, <Element '{urn:mitel:params:xml:ns:yang:vld}vl-id' at 0x00000000030689F8>, <Element '{urn:mitel:params:xml:ns:yang:vld}descriptor-version' at 0x0000000003068A48>]

I need to select the value between } and ' only for each element of the list.

This is my Code till now :

import xml.etree.ElementTree as ET  
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')  
root = tree.getroot()

# all items
print('\nAll item data:')
for elem in root:  
    all_descendants = list(elem.iter())
    print(all_descendants)

How can i achieve this ?

The text in {} is the namespace part of the qualified name ( QName ) of the XML element. AFAIK there is no method in ElementTree to return only the local name . So, you have to either

  • extract the local part of the name with string handling, as already proposed in a comment to your question,
  • use lxml.etree instead of xml.etree.ElementTree and apply xpath('local-name()') on each element,
  • or provide an XML source without namespace. You can strip the namespace with XSLT.

So, given this XML input:

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns="urn:mitel:params:xml:ns:yang:vld">
    <bar>
        <baz x="1"/>
        <yet>
            <more>
                <nested/>
            </more>
        </yet>
    </bar>
    <bar/>
</foo>

You can print a list of the local names only with this variation of your program:

import xml.etree.ElementTree as ET  
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')  
root = tree.getroot()

# all items
print('\nAll item data:')
for elem in root:
    all_descendants = [e.tag.split('}', 1)[1] for e in elem.iter()]
    print(all_descendants)

Output:

['bar', 'baz', 'yet', 'more', 'nested']
['bar']

The version with lxml.etree and xpath('local-name()') looks like this:

import lxml.etree as ET
tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')  
root = tree.getroot()

# all items
print('\nAll item data:')
for elem in root:
    all_descendants = [e.xpath('local-name()') for e in elem.iter()]
    print(all_descendants)

The output is the same as with the string handling version.


For stripping the namespace completely from your input, you can apply this XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="*">
        <xsl:element name="{local-name()}">
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

Then your original program outputs:

[<Element 'bar' at 0x04583B40>, <Element 'baz' at 0x04583B70>, <Element 'yet' at 0x04583BD0>, <Element 'more' at 0x04583C30>, <Element 'nested' at 0x04583C90>]
[<Element 'bar' at 0x04583CC0>]

Now the elements themselves do not bear a namespace. So, you don't have to strip it anymore.

You can apply the XSLT with with xsltproc , then you don't need to change your program. Alternatively, you can apply XSLT in python , but this also requires you to use lxml.etree . So, the last variation of your program looks like this:

import lxml.etree as ET

tree = ET.parse('UMR_VLD01_OAM_V6-Provider_eth0.xml')  
xslt = ET.parse('stripns.xslt')
transform = ET.XSLT(xslt)
tree = transform(tree)

root = tree.getroot()
# all items
print('\nAll item data:')
for elem in root:
    all_descendants = list(elem.iter())
    print(all_descendants)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM