[英]Keep lxml from creating self-closing tags
I have a (old) tool which does not understand self-closing tags like <STATUS/>
.我有一个(旧)工具,它不理解像
<STATUS/>
这样的自闭合标签。 So, we need to serialize our XML files with opened/closed tags like this: <STATUS></STATUS>
.因此,我们需要使用这样的打开/关闭标签来序列化我们的 XML 文件:
<STATUS></STATUS>
。
Currently I have:目前我有:
>>> from lxml import etree
>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS/>.</ERROR>'
How can I serialize with opened/closed tags?如何使用打开/关闭的标签进行序列化?
<ERROR>The status is <STATUS></STATUS>.</ERROR>
Solution解决方案
Given by wildwilhelm , below :由wildwihelm 给出,如下:
>>> from lxml import etree
>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> for status_elem in tree.xpath("//STATUS[string() = '']"):
... status_elem.text = ""
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'
It seems like the <STATUS>
tag gets assigned a text
attribute of None
:似乎
<STATUS>
标签被分配了None
的text
属性:
>>> tree[0]
<Element STATUS at 0x11708d4d0>
>>> tree[0].text
>>> tree[0].text is None
True
If you set the text
attribute of the <STATUS>
tag to an empty string, you should get what you're looking for:如果您将
<STATUS>
标签的text
属性设置为空字符串,您应该得到您要查找的内容:
>>> tree[0].text = ''
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'
With this is mind, you can probably walk a DOM tree and fix up text
attributes before writing out your XML.考虑到这一点,您可能可以在写出 XML 之前遍历 DOM 树并修复
text
属性。 Something like this:像这样的东西:
# prevent creation of self-closing tags
for node in tree.iter():
if node.text is None:
node.text = ''
If you tostring lxml dom is HTML
, you can use如果你 tostring lxml dom is
HTML
,你可以使用
etree.tostring(html_dom, method='html')
to prevent self-closing tag like <a />
防止像
<a />
这样的自闭合标签
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.