简体   繁体   中英

Using ElementTree to parse an XML string with a namespace

I have Googled my pants off to no avail. What I am trying to do is very simple: I'd like to access the UniqueID value in the following XML contained in a string using ElementTree.

from xml.etree.ElementTree import fromstring

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

NS = "http://www.example.com/dir/"

tree = fromstring(xml_string)

I know that I should use the fromstring method to parse the XML string, but I can't seem to identify how to access the UniqueID. I'm not certain how to use the find , findall , or findtext methods with respect to the namespace.

Any help is totally appreciated.

The following should get you going:

>>> tree.findall('*/*')
[<Element '{http://www.example.com/dir/}UniqueID' at 0x10899e450>]

This lists all the elements that are two levels below the root of your tree (the UniqueID element, in your case). You can, alternatively, find only the first element at this level, with tree.find() . You can then directly get the text contents of the UniqueID element:

>>> unique_id_elmt = tree.find('*/*')  # First (and only) element two levels below the root
>>> unique_id_elmt
<Element '{http://www.example.com/dir/}UniqueID' at 0x105ec9450>
>>> unique_id_elmt.text  # Text contained in UniqueID
'abcdefghijklmnopqrstuvwxyz0123456789'

Alternatively, you can directly find some precise element by specifying its full path :

>>> tree.find('{{{0}}}Item/{{{0}}}UniqueID'.format(NS))  # Tags are prefixed with NS
<Element '{http://www.example.com/dir/}UniqueID' at 0x10899ead0>

As Tomalak indicated, Fredrik Lundh's site might contain useful information; you want to check how prefixes can be handled: there might in fact be a simpler way to handle them than by making explicit the NS path in the method above.

I know there will be some yells of horror and downvotes for my answer as retaliation, because I use module re to analyse an XML string, but note that:

  • in the majority of cases , the following solution won't cause any problem

  • I wish the downvoters will give examples of cases causing problems with my solution

  • I don't parse the string, taking the word 'parse' in the sense of 'obtaining a tree before analysing the tree to find what is searched'; I analyse it: I find directly the whished portion of text where it is

I don't pretend that an XML string must always been analysed with the help of re . It is probably in the majority of cases that an XML string must be parsed with a dedicated parser. I only say that in simple cases like this one , in which a simple and fast analyze is possible, it is easier to use the regex tool, which is, by the way, faster.

import re

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

print xml_string
print '\n=================================\n'

print re.search('<UniqueID>(.+?)</UniqueID>', xml_string, re.DOTALL).group(1)

result

<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>

=================================

abcdefghijklmnopqrstuvwxyz0123456789

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM