简体   繁体   English

使用ElementTree解析带有命名空间的XML字符串

[英]Using ElementTree to parse an XML string with a namespace

I have Googled my pants off to no avail. 我用谷歌搜索我的裤子无济于事。 What I am trying to do is very simple: I'd like to access the UniqueID value in the following XML contained in a string using ElementTree. 我想要做的非常简单:我想使用ElementTree访问字符串中包含的以下XML中的UniqueID值。

from xml.etree.ElementTree import fromstring

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

NS = "http://www.example.com/dir/"

tree = fromstring(xml_string)

I know that I should use the fromstring method to parse the XML string, but I can't seem to identify how to access the UniqueID. 我知道我应该使用fromstring方法来解析XML字符串,但我似乎无法确定如何访问UniqueID。 I'm not certain how to use the find , findall , or findtext methods with respect to the namespace. 我不确定如何使用与命名空间相关的findfindallfindtext方法。

Any help is totally appreciated. 任何帮助都完全赞赏。

The following should get you going: 以下内容应该让你前进:

>>> tree.findall('*/*')
[<Element '{http://www.example.com/dir/}UniqueID' at 0x10899e450>]

This lists all the elements that are two levels below the root of your tree (the UniqueID element, in your case). 这将列出树的根下面两层的所有元素(在您的情况下为UniqueID元素)。 You can, alternatively, find only the first element at this level, with tree.find() . 或者,您可以使用tree.find()查找此级别的第一个元素。 You can then directly get the text contents of the UniqueID element: 然后,您可以直接获取UniqueID元素的文本内容:

>>> unique_id_elmt = tree.find('*/*')  # First (and only) element two levels below the root
>>> unique_id_elmt
<Element '{http://www.example.com/dir/}UniqueID' at 0x105ec9450>
>>> unique_id_elmt.text  # Text contained in UniqueID
'abcdefghijklmnopqrstuvwxyz0123456789'

Alternatively, you can directly find some precise element by specifying its full path : 或者,您可以通过指定其完整路径直接找到一些精确元素:

>>> tree.find('{{{0}}}Item/{{{0}}}UniqueID'.format(NS))  # Tags are prefixed with NS
<Element '{http://www.example.com/dir/}UniqueID' at 0x10899ead0>

As Tomalak indicated, Fredrik Lundh's site might contain useful information; 正如Tomalak所说, Fredrik Lundh的网站可能包含有用的信息; you want to check how prefixes can be handled: there might in fact be a simpler way to handle them than by making explicit the NS path in the method above. 你想检查如何处理前缀 :实际上可能有一种更简单的方法来处理它们,而不是通过在上面的方法中明确NS路径。

I know there will be some yells of horror and downvotes for my answer as retaliation, because I use module re to analyse an XML string, but note that: 我知道会有一些恐怖和沮丧的叫喊我的回答作为报复,因为我使用模块重新分析XML字符串,但请注意:

  • in the majority of cases , the following solution won't cause any problem 在大多数情况下,以下解决方案不会造成任何问题

  • I wish the downvoters will give examples of cases causing problems with my solution 我希望知情人员能举例说明导致我的解决方案出现问题的案例

  • I don't parse the string, taking the word 'parse' in the sense of 'obtaining a tree before analysing the tree to find what is searched'; 我不解析字符串,在“分析树之前获取树以找到搜索的内容”的意义上采用“解析”一词; I analyse it: I find directly the whished portion of text where it is 我分析一下:我直接找到了文本的小部分

I don't pretend that an XML string must always been analysed with the help of re . 我不假装必须在re的帮助下分析XML字符串。 It is probably in the majority of cases that an XML string must be parsed with a dedicated parser. 在大多数情况下,可能必须使用专用解析器解析XML字符串。 I only say that in simple cases like this one , in which a simple and fast analyze is possible, it is easier to use the regex tool, which is, by the way, faster. 我只是说,在像这样的简单情况下,可以进行简单快速的分析,使用正则表达式工具更容易,顺便说一句,它更快。

import re

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

print xml_string
print '\n=================================\n'

print re.search('<UniqueID>(.+?)</UniqueID>', xml_string, re.DOTALL).group(1)

result 结果

<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>

=================================

abcdefghijklmnopqrstuvwxyz0123456789

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM