简体   繁体   English

区分 <foo/> 和 <foo></foo> 在Python XML解析和生成中

[英]Distinguishing between <foo/> and <foo></foo> in Python XML parsing and generation

I have been using Python's ElementTree to create an XML document, and so far so good. 我一直在使用Python的ElementTree创建XML文档,到目前为止,一切都很好。 Yet the problem I am now facing is that due to project requirements, I need to produce an XML document which has elements with start and end tags as well as self-closing tag elements. 但是,我现在面临的问题是,由于项目要求,我需要生成一个XML文档,该文档包含带有开始和结束标签以及自闭合标签元素的元素。 I need to output empty tags with start/end tags and also keep self-closed tag elements. 我需要输出带有开始/结束标签的空标签,并保留自封闭标签元素。 The current implementation either produces self-closing tags when there are empty elements and thus keeps the self-closing tags, this is not correct due to project requirements. 当前实现要么在元素为空时生成自动关闭标签,然后保留自动关闭标签,但由于项目要求,这是不正确的。 Also, if I force start/end tags for empty elements, the self-closing tags are also transformed into start/end tag elements, this is not correct either. 另外,如果我为空元素强制使用开始/结束标签,则自动关闭标签也将转换为开始/结束标签元素,这也不正确。

Can some one please help me out and point me to a possible solution, any all suggestions are welcomed. 有人可以帮我一下,为我指出一个可能的解决方案,欢迎提出任何建议。 I need to use Python 2.7. 我需要使用Python 2.7。 Thank you. 谢谢。

As the XML standard is concerned, an empty tag means the exact same thing as a self-closing tag. 就XML标准而言,空标签的含义与自动关闭标签的含义完全相同。

So, first, this probably isn't a good idea in the first place. 因此,首先,这可能不是一个好主意。

And second, most XML libraries probably aren't going to let you distinguish between the two. 其次,大多数XML库可能不会让您区分两者。

But if you need to do this, you can always patch any library you want. 但是,如果需要执行此操作,则始终可以修补所需的任何库。 Since you're already using ElementTree , that seems like the obvious choice to patch. 由于您已经在使用ElementTree ,因此似乎是进行修补的明显选择。


In the latest versions of ElementTree (including the version that comes with Python 3.4+, but in older Pythons you'll need to install the latest externally-maintained version), you can actually control this globally , with the short_empty_elements argument to write and related functions. ElementTree的最新版本中(包括Python 3.4+附带的版本,但是在较旧的Python中,您需要安装最新的外部维护版本),您实际上可以使用short_empty_elements参数进行write和相关控制,以全局方式进行控制。职能。 But, as you say, this isn't what you actually want; 但是,正如您所说,这并不是您真正想要的。 you need some elements to be self-closing and some not. 您需要一些元素可以自动关闭,而有些则不需要。

I think you'd be better off starting from the externally-maintained version of ElementTree , rather than the version that comes built in with Python 2.7. 我认为您最好从ElementTree的外部维护版本开始,而不是从Python 2.7内置的版本开始。 But I'm not sure where its official repo is, so I'm going to link to the Python 3.4 code instead. 但是我不确定它的官方仓库在哪里,所以我将链接到Python 3.4代码。 Hopefully that gives you enough to take it from there. 希望这能给您足够的帮助。

The key function is serialize_xml . 关键功能是serialize_xml I think that function isn't C-accelerated, so you only need to change the pure Python version. 认为该函数不是C加速的,因此您只需要更改纯Python版本。 In which case it's just one line: 在这种情况下,只有一行:

if text or len(elem) or not short_empty_elements:

Change it to: 更改为:

if text or len(elem) or not getattr(elem, 'short_empty', short_empty_elements):

And now, if you set node.short_empty = True or node.short_empty = False on an empty node, it will override the global settings for short_empty_elements . 现在,如果您在一个空节点上设置node.short_empty = Truenode.short_empty = False ,它将覆盖short_empty_elements的全局设置。


Except… I think if you're using the C accelerator, you can't add attributes (I mean Python attributes, like node.short_empty , not XML attributes) to an Element . 除了…我认为,如果您使用的是C加速器,则无法将属性(我的意思是Python属性,例如node.short_empty ,而不是XML属性)添加到Element Which means you'll either need to patch Element to allow this (which is partly in C —you'll have to not disable the __dict__ and modify the else to call PyObject_GenericSetAttr instead of raising), or fake it by, eg, using some fake XML attribute, which you strip out when serializing. 这意味着你要么需要修补Element允许的(这部分用C -you'll有没有禁用__dict__和修改else调用PyObject_GenericSetAttr而不是提高),或捏造事实,例如,使用一些假的XML属性,在序列化时将其删除。

Of course if you're using ElementTree rather than cElementTree in 2.7, you're not using the C accelerator, so you probably don't need to worry about this part. 当然,如果您在2.7中使用的是ElementTree而不是cElementTree ,则您没有使用C加速器,因此您可能不必担心这部分。


You might want to consider looking at the lxml implementation of the ElementTree API to see if it's easier to patch. 您可能需要考虑查看ElementTree API的lxml实现,以查看是否更易于修补。


Meanwhile, considering that they've added short_empty_elements to the library, the maintainers might be interested in accepting your patch upstream. 同时,考虑到他们已将short_empty_elements添加到库中,维护人员可能会对在上游接受您的补丁感兴趣。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM