简体   繁体   English

无法使用xts:前缀解析XML中的属性

[英]Unable to parse attribute in XML with xts: prefix

I have a python script running to parse an XML document using the ElementTree library. 我有一个运行Python脚本来使用ElementTree库解析XML文档的脚本。 I am able to parse all of the data, including attribute data, however there are a few attributes that have "xts:" as a prefix. 我能够解析所有数据,包括属性数据,但是有一些属性以“ xts:”作为前缀。

So: 所以:

var1 = child.attrib['abc']
var2 = child.attrib['xts:xyz']

When I run the script, it is able to collect the "abc" attribute data but the "xts:xyz" attribute data is null, despite the fact that there is content associated with that attribute. 当我运行脚本时,尽管存在与该属性相关联的内容,但它能够收集“ abc”属性数据,但“ xts:xyz”属性数据为空。

It doesn't sound like ":" is a special character in Python that I need to escape. 它听起来不像“:”是Python中需要转义的特殊字符。 Any ideas? 有任何想法吗?

The issue here is that xts is a namespace . 这里的问题是xts是一个命名空间 It's not necessary to escape it, but with ElementTree , it is necessary to tell it about the namespace in order to get it to work properly. 不必逃避它,但是使用ElementTree ,有必要告诉它有关名称空间的信息,以使其正常工作。

For example, this code (using XPath syntax in findall ): 例如,以下代码(在findall使用XPath语法):

import xml.etree.ElementTree as ET

xmlStr = """<?xml version="1.0" encoding="UTF-8"?>
<stuff xmlns:xts="http://www.stackoverflow.com">
    <abc foo="bar">Baz</abc>
    <xts:xyz narf="poit">troz</xts:xyz>
</stuff>
"""    

namespaces = {"xts": "http://www.stackoverflow.com"}

root = ET.fromstring(xmlStr)

abcNode = root.findall("./abc", namespaces=namespaces)
xyzNode = root.findall("./xts:xyz", namespaces=namespaces)

Yields these results: 产生以下结果:

>>> print abcNode[0].attrib
{'foo': 'bar'}
>>> print xyzNode[0].attrib
{'narf': 'poit'}

For more discussion/details about parsing namespaces using ElementTree, you can refer to Parsing XML with namespace in Python via 'ElementTree' . 有关使用ElementTree解析名称空间的更多讨论/细节,您可以参考通过'ElementTree'在Python中使用名称空间解析XML

Edit in response to comment from OP: 根据OP的评论进行编辑:

Given this code (added to the above code for the import , etc), which reflects the colon in the attribute of the xyz node: 给定此代码(将上述代码添加到import的上面的代码等)中,该代码在xyz节点的属性中反映了冒号:

xmlStr2 = """<?xml version="1.0" encoding="UTF-8"?>
<stuff>
    <abc foo="bar">Baz</abc>
    <xyz narf="xts:poit">troz</xyz>
</stuff>
"""

root2 = ET.fromstring(xmlStr2)

abcNode2 = root2.findall("./abc")
xyzNode2 = root2.findall("./xyz")

print "abc2 attrib: {0}".format(abcNode2[0].attrib)
print "xyz2 attrib: {0}".format(xyzNode2[0].attrib)

This net-new outputs: 这个新的输出:

abc2 attrib: {'foo': 'bar'}
xyz2 attrib: {'narf': 'xts:poit'}

So ElementTree doesn't have an issue with parsing an attribute containing a colon. 因此, ElementTree在解析包含冒号的属性时没有问题。

You mentioned in your comment that: 您在评论中提到:

I still get a key error, regardless if I use xyzNode.attrib['poit'] or xyzNode.attrib['xts:poit'] 无论使用xyzNode.attrib ['poit']还是xyzNode.attrib ['xts:poit'],我仍然会遇到关键错误。

I think the crux of that issue (at least in regards to find ) is that what it returns is a list of Element objects (even if it's just a single Element ), as seen here: 我认为问题的症结(至少在find )是它返回的是Element对象的列表 (即使它只是单个Element ),如下所示:

>>> print xyzNode2
[<Element 'xyz' at 0x7f59bed39150>]

So in order to use attrib , you need to access an element within that list. 因此,要使用attrib ,您需要访问该列表中的元素。 You could use a for-in loop to loop over all of them and process them (or in this case the single one) accordingly, or if you know there's only one, you can just access it directly using a [0] subscript, as I did above. 您可以使用for-in循环遍历所有它们并相应地处理它们(在这种情况下为单个),或者,如果您知道只有一个,则可以直接使用[0]下标进行访问,例如我做到了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM