简体   繁体   中英

How to pretty print an xml file in Python?

I'd like to tidy a complicated xml file, using lxml. The problem is it has many elements which have tail. For example, there's an xml like this:

 <body><part>n</part> attend </body>

I want to tidy this into this:

 <body>
    <part>n</part> attend 
 </body>

I tried to apply pretty_print with remove_blank_text parser in lxml at first. But it failed.

import lxml.etree as ET
xml_doc = '<body><part>n</part> attend </body>'
parser = ET.XMLParser(remove_blank_text=True)
root = ET.fromstring(xml_doc, parser)
print(ET.tostring(root, pretty_print=True))
>>>b'<body><part>n</part> attend </body>\n'

And then, I tried again without applying the parser to no avail.

import lxml.etree as ET
xml_doc = '<body><part>n</part> attend </body>'
root = ET.fromstring(xml_doc)
print(ET.tostring(root, pretty_print=True))
>>>b'<body><part>n</part> attend </body>\n'

If the pretty_print attribute does not help, you can probably write your own recursive method to do a pretty print. Something on the lines of


def pprint(root, indentTabs = 0):
    print "<%s%s>" % (indentTabs*"\t", root.tag)
    print (indentTabs+1)*"\t" + root.value
    for element in root.children():
        pprint (element, indentTabs+1)
    print "</%s%s>" % (indentTabs*"\t", root.tag)

Though there might be some already available options. The above method would take care of just tags. You might need to add code to take care of xml attributes as well, if they are present in your xml.

EDIT: The above will print in the format

<tag>
    text
</tag>

You can modify it further according to the output you need.

我遇到了同样的问题,并使用tounicode()为我解决了这个问题。

print(ET.tounicode(root, pretty_print=True))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM