简体   繁体   English

如何让 Python 的 ElementTree 漂亮地打印到 XML 文件?

[英]How do I get Python's ElementTree to pretty print to an XML file?

Background背景

I am using SQLite to access a database and retrieve the desired information.我正在使用 SQLite 来访问数据库并检索所需的信息。 I'm using ElementTree in Python version 2.6 to create an XML file with that information.我在 Python 2.6 版中使用 ElementTree 来创建包含该信息的 XML 文件。

Code代码

import sqlite3
import xml.etree.ElementTree as ET

# NOTE: Omitted code where I acccess the database,
# pull data, and add elements to the tree

tree = ET.ElementTree(root)

# Pretty printing to Python shell for testing purposes
from xml.dom import minidom
print minidom.parseString(ET.tostring(root)).toprettyxml(indent = "   ")

#######  Here lies my problem  #######
tree.write("New_Database.xml")

Attempts尝试

I've tried using tree.write("New_Database.xml", "utf-8") in place of the last line of code above, but it did not edit the XML's layout at all - it's still a jumbled mess.我尝试使用tree.write("New_Database.xml", "utf-8")代替上面的最后一行代码,但它根本没有编辑 XML 的布局 - 它仍然是一团糟。

I also decided to fiddle around and tried doing:我还决定摆弄并尝试做:
tree = minidom.parseString(ET.tostring(root)).toprettyxml(indent = " ")
instead of printing this to the Python shell, which gives the error AttributeError: 'unicode' object has no attribute 'write' .而不是将其打印到 Python shell,这会给出错误AttributeError: 'unicode' object has no attribute 'write'

Questions问题

When I write my tree to an XML file on the last line, is there a way to pretty print to the XML file as it does to the Python shell? When I write my tree to an XML file on the last line, is there a way to pretty print to the XML file as it does to the Python shell?

Can I use toprettyxml() here or is there a different way to do this?我可以在这里使用toprettyxml()还是有其他方法可以做到这一点?

Whatever your XML string is, you can write it to the file of your choice by opening a file for writing and writing the string to the file.无论您的 XML 字符串是什么,您都可以通过打开一个文件来将其写入您选择的文件,然后将其写入文件。

from xml.dom import minidom

xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ")
with open("New_Database.xml", "w") as f:
    f.write(xmlstr)

There is one possible complication, especially in Python 2, which is both less strict and less sophisticated about Unicode characters in strings.有一种可能的复杂情况,尤其是在 Python 2 中,它对字符串中的 Unicode 字符既不严格也不复杂。 If your toprettyxml method hands back a Unicode string ( u"something" ), then you may want to cast it to a suitable file encoding, such as UTF-8.如果您的toprettyxml方法返回一个 Unicode 字符串( u"something" ),那么您可能希望将其转换为合适的文件编码,例如 UTF-8。 Eg replace the one write line with:例如,将一个写入行替换为:

f.write(xmlstr.encode('utf-8'))

I simply solved it with the indent() function:我只是用indent()函数解决了它:

xml.etree.ElementTree.indent(tree, space=" ", level=0) Appends whitespace to the subtree to indent the tree visually. xml.etree.ElementTree.indent(tree, space=" ", level=0)将空格附加到子树以直观地缩进树。 This can be used to generate pretty-printed XML output.这可用于生成打印精美的 XML 输出。 tree can be an Element or ElementTree .树可以是ElementElementTree space is the whitespace string that will be inserted for each indentation level, two space characters by default. space是将为每个缩进级别插入的空白字符串,默认情况下是两个空格字符。 For indenting partial subtrees inside of an already indented tree, pass the initial indentation level as level .要在已经缩进的树内缩进部分子树,请将初始缩进级别作为level传递。

tree = ET.ElementTree(root)
ET.indent(tree, space="\t", level=0)
tree.write(file_name, encoding="utf-8")

Note, the indent() function was added in Python 3.9.请注意, indent()函数是在 Python 3.9 中添加的。

I found a way using straight ElementTree, but it is rather complex.我找到了一种使用直接 ElementTree 的方法,但它相当复杂。

ElementTree has functions that edit the text and tail of elements, for example, element.text="text" and element.tail="tail" . ElementTree 具有编辑元素文本和尾部的功能,例如element.text="text"element.tail="tail" You have to use these in a specific way to get things to line up, so make sure you know your escape characters.你必须以特定的方式使用这些来让事情排成一行,所以要确保你知道你的转义字符。

As a basic example:作为一个基本示例:

I have the following file:我有以下文件:

<?xml version='1.0' encoding='utf-8'?>
<root>
    <data version="1">
        <data>76939</data>
    </data>
    <data version="2">
        <data>266720</data>
        <newdata>3569</newdata>
    </data>
</root>

To place a third element in and keep it pretty, you need the following code:要放置第三个元素并使其保持美观,您需要以下代码:

addElement = ET.Element("data")             # Make a new element
addElement.set("version", "3")              # Set the element's attribute
addElement.tail = "\n"                      # Edit the element's tail
addElement.text = "\n\t\t"                  # Edit the element's text
newData = ET.SubElement(addElement, "data") # Make a subelement and attach it to our element
newData.tail = "\n\t"                       # Edit the subelement's tail
newData.text = "5431"                       # Edit the subelement's text
root[-1].tail = "\n\t"                      # Edit the previous element's tail, so that our new element is properly placed
root.append(addElement)                     # Add the element to the tree.

To indent the internal tags (like the internal data tag), you have to add it to the text of the parent element.要缩进内部标签(如内部数据标签),您必须将其添加到父元素的文本中。 If you want to indent anything after an element (usually after subelements), you put it in the tail.如果你想在一个元素之后缩进任何东西(通常是在子元素之后),你把它放在尾部。

This code give the following result when you write it to a file:当您将其写入文件时,此代码会给出以下结果:

<?xml version='1.0' encoding='utf-8'?>
<root>
    <data version="1">
        <data>76939</data>
    </data>
    <data version="2">
        <data>266720</data>
        <newdata>3569</newdata>
    </data> <!--root[-1].tail-->
    <data version="3"> <!--addElement's text-->
        <data>5431</data> <!--newData's tail-->
    </data> <!--addElement's tail-->
</root>

As another note, if you wish to make the program uniformally use \t , you may want to parse the file as a string first, and replace all of the spaces for indentations with \t .另请注意,如果您希望程序统一使用\t ,您可能需要先将文件解析为字符串,然后将所有缩进空格替换为\t

This code was made in Python3.7, but still works in Python2.7.此代码是在 Python3.7 中编写的,但在 Python2.7 中仍然有效。

Install bs4安装bs4

pip install bs4

Use this code to pretty print:使用此代码进行漂亮的打印:

from bs4 import BeautifulSoup

x = your xml

print(BeautifulSoup(x, "xml").prettify())

If one wants to use lxml, it could be done in the following way:如果要使用 lxml,可以通过以下方式完成:

from lxml import etree

xml_object = etree.tostring(root,
                            pretty_print=True,
                            xml_declaration=True,
                            encoding='UTF-8')

with open("xmlfile.xml", "wb") as writter:
    writter.write(xml_object)`

If you see xml namespaces eg py:pytype="TREE" , one might want to add before the creation of xml_object如果您看到 xml 命名空间,例如py:pytype="TREE" ,可能需要在创建xml_object之前添加

etree.cleanup_namespaces(root) 

This should be sufficient for any adaptation in your code.这对于您的代码中的任何调整都应该足够了。

Riffing on Ben Anderson answer as a function.将本安德森的答案视为一个函数。

def _pretty_print(current, parent=None, index=-1, depth=0):
    for i, node in enumerate(current):
        _pretty_print(node, current, i, depth + 1)
    if parent is not None:
        if index == 0:
            parent.text = '\n' + ('\t' * depth)
        else:
            parent[index - 1].tail = '\n' + ('\t' * depth)
        if index == len(parent) - 1:
            current.tail = '\n' + ('\t' * (depth - 1))

So running the test on unpretty data:所以在不漂亮的数据上运行测试:

import xml.etree.ElementTree as ET
root = ET.fromstring('''<?xml version='1.0' encoding='utf-8'?>
<root>
    <data version="1"><data>76939</data>
</data><data version="2">
        <data>266720</data><newdata>3569</newdata>
    </data> <!--root[-1].tail-->
    <data version="3"> <!--addElement's text-->
<data>5431</data> <!--newData's tail-->
    </data> <!--addElement's tail-->
</root>
''')
_pretty_print(root)

tree = ET.ElementTree(root)
tree.write("pretty.xml")
with open("pretty.xml", 'r') as f:
    print(f.read())

We get:我们得到:

<root>
    <data version="1">
        <data>76939</data>
    </data>
    <data version="2">
        <data>266720</data>
        <newdata>3569</newdata>
    </data>
    <data version="3">
        <data>5431</data>
    </data>
</root>

Take a look at the vkbeautify module. 看看vkbeautify模块。

Input and output can be string/file in any combinations. 输入和输出可以是任何组合的字符串/文件。 It is very compact and doesn't have any dependency. 它非常紧凑,没有任何依赖性。

import vkbeautify as vkb

a) pretty_text = vkb.xml(your_xml_text)  #return String   

b) vkb.xml(your_xml_text, 'path/to/dest/file') #save in file 

One liner(*) to read, parse (once) and pretty print XML from file named fname :一个 liner(*) 从名为fname的文件中读取、解析(一次)和漂亮地打印 XML:

from xml.dom import minidom
print(minidom.parseString(open(fname).read()).toprettyxml(indent="  "))

(* not counting import) (* 不包括进口)

Using pure ElementTree and Python 3.9+:使用纯 ElementTree 和 Python 3.9+:

def prettyPrint(element):
    encoding = 'UTF-8'
    # Create a copy of the input element: Convert to string, then parse again
    copy = ET.fromstring(ET.tostring(element))
    # Format copy. This needs Python 3.9+
    ET.indent(copy, space="    ", level=0)
    # tostring() returns a binary, so we need to decode it to get a string
    return ET.tostring(copy, encoding=encoding).decode(encoding)

If you need a file, replace the last line with with copy.write(...) to avoid the extra overhead.如果您需要一个文件,请将最后一行替换为copy.write(...)以避免额外的开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM