I have an xml file that I'm using etree from lxml to work with, but when I add tags to it, pretty printing doesn't seem to work.
>>> from lxml import etree
>>> root = etree.parse('file.xml').getroot()
>>> print etree.tostring(root, pretty_print = True)
<root>
<x>
<y>test1</y>
</x>
</root>
So far so good. But now
>>> x = root.find('x')
>>> z = etree.SubElement(x, 'z')
>>> etree.SubElement(z, 'z1').attrib['value'] = 'val1'
>>> print etree.tostring(root, pretty_print = True)
<root>
<x>
<y>test1</y>
<z><z1 value="val1"/></z></x>
</root>
it's no longer pretty. I've also tried to do it "backwards" where I create the z1 tag, then create the z tag and append z1 to it, then append the z tag to the x tag. But I get the same result.
If I don't parse the file and just create all the tags in one go, it'll print correctly. So I think it has something to do with parsing the file.
How can I get pretty printing to work?
It has to do with how lxml
treats whitespace -- see the lxml FAQ for details.
To fix this, change the loading part of the file to the following:
parser = etree.XMLParser(remove_blank_text=True)
root = etree.parse('file.xml', parser).getroot()
I didn't test it, but it should indent your file just fine with this change.
I was having the same issue when writing to files, for anyone else with this issue:
I created a helper function that pretty_prints after I run my main function.
from lxml import etree
def ppxml(xml):
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(xml, parser)
tree.write(xml, encoding='utf-8', pretty_print=True, xml_declaration=True)
In in my main program file
if __name__ == '__main__':
main()
ppxml(xml)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.