简体   繁体   English

在lxml中关闭没有文本的标记

[英]Close a tag with no text in lxml

I am trying to output a XML file using Python and lxml 我正在尝试使用Python和lxml输出XML文件

However, I notice one thing that if a tag has no text, it does not close itself. 但是,我注意到一件事,如果标签没有文本,它就不会自行关闭。 An example of this would be: 一个例子是:

root = etree.Element('document')
rootTree = etree.ElementTree(root)
firstChild = etree.SubElement(root, 'test')

The output of this is: 这个输出是:

<document>
<test/>
</document

I want the output to be: 我希望输出为:

<document>
<test>
</test>
</document>

So basically I want to close a tag which has no text, but is used to the attribute value. 所以基本上我想关闭一个没有文本的标签,但是用于属性值。 How do I do that? 我怎么做? And also, what is such a tag called? 而且,这样的标签叫什么? I would have Googled it, but I don't know how to search for it. 我会用谷歌搜索它,但我不知道如何搜索它。

Note that <test></test> and <test/> mean exactly the same thing. 请注意, <test></test><test/>意味着完全相同。 What you want is for the test-tag to actually do have a text that consists in a single linebreak. 你想要的是test-tag实际上有一个包含在单个换行符中的文本。 However, an empty tag with no text is usually written as <test/> and it makes very little sense to insist on it to appear as <test></test> . 但是,没有文本的空标记通常写为<test/>并且坚持将其显示为<test></test>是没有意义的。

To clarify @ymv answer in case it might be of help to others: 澄清@ymv答案,以防它可能对其他人有所帮助:

from lxml import etree

root = etree.Element('document')
rootTree = etree.ElementTree(root)
firstChild = etree.SubElement(root, 'test')

print(etree.tostring(root, method='html'))
### b'<document><test></test></document>'

Use lxml.html.tostring to serialize to HTML 使用lxml.html.tostring序列化为HTML

import lxml.html
root = lxml.html.fromstring(mydocument)
print(lxml.html.tostring(root))

Use empty string '' like this: 像这样使用空字符串'':

root = etree.Element('document')
etree.SubElement(root, 'test').text = ''

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM