简体   繁体   English

在序列化为字符串时,如何防止lxml自动关闭空元素?

[英]How can I prevent lxml from auto-closing empty elements when serializing to string?

I am parsing a huge xml file which contains many empty elements such as 我正在解析一个巨大的xml文件,其中包含许多空元素,如

<MemoryEnv></MemoryEnv>

When serializing with 使用时序列化

etree.tostring(root_element, pretty_print_True)

the output element is collapsed to 输出元素折叠为

<MemoryEnv/>

Is there any way to prevent this? 有什么方法可以防止这种情况吗? the etree.tostring() does not provide such a facility. etree.tostring()不提供这样的功能。

Is there a way interfere with lxml's tostring() serializer? 有没有办法干扰lxml的tostring()序列化程序?

Btw, the html module does not work. 顺便说一下, html模块不起作用。 It's not designed for XML, and it does not create empty elements in their original form. 它不是为XML设计的,它不会以原始形式创建空元素。

The problem is, that although collapsed and uncollapsed forms of an empty element are equivalent, the program that parses this file won't work with collapsed empty elements. 问题是,虽然空元素的折叠和未折叠形式是等效的,但解析此文件的程序将无法使用折叠的空元素。

Here is a way to do it. 这是一种方法。 Ensure that the text value for all empty elements is not None . 确保所有空元素的text值不是None

Example: 例:

from lxml import etree

XML = """
<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>"""

doc = etree.fromstring(XML)

for elem in doc.iter():
    if elem.text == None:
        elem.text = ''

print etree.tostring(doc)

Output: 输出:

<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>

An alternative is to use the write_c14n() method to write canonical XML (which does not use the special empty-element syntax) to a file. 另一种方法是使用write_c14n()方法将规范XML (不使用特殊的空元素语法)写入文件。

from lxml import etree

XML = """
<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>"""

doc = etree.fromstring(XML)

doc.getroottree().write_c14n("out.xml")

Using XML method (c14n) for printing and it works with lxml, it does not collapse empty elements. 使用XML方法(c14n)进行打印并使用lxml,它不会折叠空元素。

>>> from lxml import etree
>>> s = "<MemoryEnv></MemoryEnv>"
>>> root_element = etree.XML(s)
>>> etree.tostring(root_element, method="c14n")
b'<MemoryEnv></MemoryEnv>'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM