简体   繁体   中英

xml string literal written to file is wrongly formatted

I'm using the following code to write xml string literals to an xml file.

from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("test.xml", parser)
root = tree.getroot()
phrase = '''
    <d:entry xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng" id="test" d:title="test">
    <d:index d:value="test" d:title="test"/><d:index d:value="test2" d:title="test2"/>
    <div class="ODECN"><div class="extras"><div class="phrase"><span class="word_title"><i>test</i></span>: <p>test <a></a>test</p> </div><p class="ref">See main entry:<a href="x-dictionary:d:test">test</a></p></div></div>
    </d:entry>'''
b = etree.fromstring(phrase)
root.insert(0, b)
tree.write("newtest.xml", xml_declaration=True, encoding='utf-8', pretty_print=False)

I'd like the xml string literals to be output to the file as is, ie in 4 lines, as follows:

<d:entry xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng" id="{}" d:title="{}">
    <d:index d:value="{}" d:title="{}"/><d:index d:value="{}" d:title="{}"/>
    <div class="ODECN"><div class="extras"><div class="phrase"><span class="word_title"><i>{}</i></span>: {}{}</div><p class="ref">See main entry:<a href="x-dictionary:d:{}">{}</a></p></div></div>
</d:entry>

But the resulting xml file shows somehow the parser formats the string literal to be more of a hierarchy, or structure, which is not needed, and it's much more lines than I expected as you can see in the picture below.

在此处输入图片说明

The <d:entry is in the wrong position too, it should start as a line start.

I have tried adding this parser to etree :

etree.XMLParser(remove_blank_text=True)

But this does not help at all. I don't know if there's another setting that I don't know to make it work. Anyone familiar with this?

Any input is much appreciated.

Here's the content of the test.xml file:

<?xml version="1.0" encoding="utf-8"?>
<d:dictionary xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
<d:entry id="test0" d:title="test0">
<d:index d:value="test0" d:title="test0"/><d:index d:value="test00" d:title="test00"/>
<div class="ODECN"><div class="extras"><div class="phrase"><span class="word_title"><i>test</i></span>: <p>test <a></a>test</p> </div><p class="ref">See main entry:<a href="x-dictionary:d:test">test</a></p></div></div>
</d:entry>
</d:dictionary>

I'm using Python 3.7 and lxml.

The value of phrase is a single, multi-line, triple-quoted string . As it is a single string, the whitespace at the beginning of each line and the newlines at the end of each line are part of the string, and this is what is causing the formatting issues that you see.

The simplest solution is to take advantage of the fact that Python will concatenate successive strings automatically. Wrap the value of phrase in brackets and triple-quote each line.

phrase = ("""<d:entry xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng" id="test" d:title="test">
          """<d:index d:value="test" d:title="test"/><d:index d:value="test2" d:title="test2"/>"""
          """<div class="ODECN"><div class="extras"><div class="phrase"><span class="word_title"><i>test</i></span>:
          """</d:entry>""")

This will eliminate the leading whitespace and newlines from the generated xml file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM