简体   繁体   English

写入文件的xml字符串文字格式错误

[英]xml string literal written to file is wrongly formatted

I'm using the following code to write xml string literals to an xml file. 我正在使用以下代码将xml字符串文字写入xml文件。

from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("test.xml", parser)
root = tree.getroot()
phrase = '''
    <d:entry xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng" id="test" d:title="test">
    <d:index d:value="test" d:title="test"/><d:index d:value="test2" d:title="test2"/>
    <div class="ODECN"><div class="extras"><div class="phrase"><span class="word_title"><i>test</i></span>: <p>test <a></a>test</p> </div><p class="ref">See main entry:<a href="x-dictionary:d:test">test</a></p></div></div>
    </d:entry>'''
b = etree.fromstring(phrase)
root.insert(0, b)
tree.write("newtest.xml", xml_declaration=True, encoding='utf-8', pretty_print=False)

I'd like the xml string literals to be output to the file as is, ie in 4 lines, as follows: 我希望将xml字符串文字原样输出到文件,即4行,如下所示:

<d:entry xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng" id="{}" d:title="{}">
    <d:index d:value="{}" d:title="{}"/><d:index d:value="{}" d:title="{}"/>
    <div class="ODECN"><div class="extras"><div class="phrase"><span class="word_title"><i>{}</i></span>: {}{}</div><p class="ref">See main entry:<a href="x-dictionary:d:{}">{}</a></p></div></div>
</d:entry>

But the resulting xml file shows somehow the parser formats the string literal to be more of a hierarchy, or structure, which is not needed, and it's much more lines than I expected as you can see in the picture below. 但是生成的xml文件显示了解析器以某种方式将字符串文字格式化为更多的层次结构或结构,这是不必要的,而且比我预期的多得多,如下图所示。

在此处输入图片说明

The <d:entry is in the wrong position too, it should start as a line start. <d:entry也在错误的位置,它应该以一行开头开始。

I have tried adding this parser to etree : 我试过将此解析器添加到etree

etree.XMLParser(remove_blank_text=True)

But this does not help at all. 但这根本没有帮助。 I don't know if there's another setting that I don't know to make it work. 我不知道是否还有其他设置无法使它起作用。 Anyone familiar with this? 有人熟悉吗?

Any input is much appreciated. 非常感谢任何输入。

Here's the content of the test.xml file: 这是test.xml文件的内容:

<?xml version="1.0" encoding="utf-8"?>
<d:dictionary xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
<d:entry id="test0" d:title="test0">
<d:index d:value="test0" d:title="test0"/><d:index d:value="test00" d:title="test00"/>
<div class="ODECN"><div class="extras"><div class="phrase"><span class="word_title"><i>test</i></span>: <p>test <a></a>test</p> </div><p class="ref">See main entry:<a href="x-dictionary:d:test">test</a></p></div></div>
</d:entry>
</d:dictionary>

I'm using Python 3.7 and lxml. 我正在使用Python 3.7和lxml。

The value of phrase is a single, multi-line, triple-quoted string . phrase的值是一个单行,多行,三引号字符串 As it is a single string, the whitespace at the beginning of each line and the newlines at the end of each line are part of the string, and this is what is causing the formatting issues that you see. 因为是单个字符串,所以每行开头的空白和每行末尾的换行符是字符串的一部分,这就是导致您看到格式问题的原因。

The simplest solution is to take advantage of the fact that Python will concatenate successive strings automatically. 最简单的解决方案是利用Python自动连接连续字符串的事实。 Wrap the value of phrase in brackets and triple-quote each line. phrase的值括在方括号中,并在每行三引号。

phrase = ("""<d:entry xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng" id="test" d:title="test">
          """<d:index d:value="test" d:title="test"/><d:index d:value="test2" d:title="test2"/>"""
          """<div class="ODECN"><div class="extras"><div class="phrase"><span class="word_title"><i>test</i></span>:
          """</d:entry>""")

This will eliminate the leading whitespace and newlines from the generated xml file. 这将消除生成的xml文件中的前导空格和换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何正确读取csv中格​​式错误的字符串? - How to read wrongly formatted string in csv properly? 正则表达式:在格式错误的日期时间字符串中添加一个空格 - Regex: Adding a space in wrongly formatted datetime string Python 3.6+ 中是否有格式化的字节字符串文字? - Is there a formatted byte string literal in Python 3.6+? Python格式化的文字字符串和千位分隔符 - Python formatted literal string and thousand separator Python class 以字符串形式写入文字 object - Python class written in string to literal object 将格式化和突出显示的多行文本 (SQL) 粘贴到 PyCharm 中的字符串文字中 - Paste a formatted and higlighted multiline text (SQL) into string literal in PyCharm 如何在Odoo 10中翻译以文字字符串形式写在python上的术语? - How to translate a term written as a literal string on python in Odoo 10? 二进制数据被写入字符串文字 - 如何将其转换回字节? - Binary data gets written as string literal - how to convert it back to bytes? 如何以格式化字符串打开文件? - How to open file as formatted string? 使用 echo 写入文件的换行符在 shell 中有效,但在 Python 3 中为文字 - Newline character written to file with echo works in shell, but literal in Python 3
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM