简体   繁体   English

防止 lxml 创建自闭合标签

[英]Keep lxml from creating self-closing tags

I have a (old) tool which does not understand self-closing tags like <STATUS/> .我有一个(旧)工具,它不理解像<STATUS/>这样的自闭合标签。 So, we need to serialize our XML files with opened/closed tags like this: <STATUS></STATUS> .因此,我们需要使用这样的打开/关闭标签来序列化我们的 XML 文件: <STATUS></STATUS>

Currently I have:目前我有:

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS/>.</ERROR>'

How can I serialize with opened/closed tags?如何使用打开/关闭的标签进行序列化?

<ERROR>The status is <STATUS></STATUS>.</ERROR>

Solution解决方案

Given by wildwilhelm , below :wildwihelm 给出如下

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> for status_elem in tree.xpath("//STATUS[string() = '']"):
...     status_elem.text = ""
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

It seems like the <STATUS> tag gets assigned a text attribute of None :似乎<STATUS>标签被分配了Nonetext属性:

>>> tree[0]
<Element STATUS at 0x11708d4d0>
>>> tree[0].text
>>> tree[0].text is None
True

If you set the text attribute of the <STATUS> tag to an empty string, you should get what you're looking for:如果您将<STATUS>标签的text属性设置为空字符串,您应该得到您要查找的内容:

>>> tree[0].text = ''
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

With this is mind, you can probably walk a DOM tree and fix up text attributes before writing out your XML.考虑到这一点,您可能可以在写出 XML 之前遍历 DOM 树并修复text属性。 Something like this:像这样的东西:

# prevent creation of self-closing tags
for node in tree.iter():
    if node.text is None:
        node.text = ''

If you tostring lxml dom is HTML , you can use如果你 tostring lxml dom is HTML ,你可以使用

etree.tostring(html_dom, method='html')

to prevent self-closing tag like <a />防止像<a />这样的自闭合标签

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM