简体繁体 English

如何在python中漂亮地打印xml而不生成DOM树？

[英]How to pretty print xml in python without generating a DOM tree?

原文 2015-09-17 06:10:56 1 1 python/ xml

Generating a DOM tree is too expensive for very large xml data. 对于非常大的xml数据而言，生成DOM树太昂贵了。 Is there a method to accomplish the printing without generating it? 有没有一种方法可以完成打印而不生成它？ I am using python-2.7. 我正在使用python-2.7。

1 个解决方案

Whatever the language, the way to parse a XML document without generating a tree is to use event-oriented parsers. 无论使用哪种语言，在不生成树的情况下解析XML文档的方法就是使用面向事件的解析器。 With these kinds of parser, you give to the parser some event handlers that the parser will call at specific points of the processing: beginning of a node, end of a node, beginning of data, etc. 使用这些类型的解析器，您可以为解析器提供一些事件处理程序，解析器将在处理的特定点调用这些事件处理程序：节点的开头，节点的结尾，数据的开头等。

So you can use that kind of parser and go to a new line each time there is a new node, and increase indentation where you are entering a node and decrease indentation when you are exiting a node. 因此，您可以使用这种解析器，并在每次有新节点时都转到新行，并增加进入节点时的缩进量，并减少退出节点时的缩进量。 Because of the way these parsers work, it will be tricky to look ahead to see for example if a node fit in a line, so the pretty print may not be as pretty as when working with a tree (or you can, but it would be complicated). 由于这些解析器的工作方式，因此很难向前看例如某个节点是否适合一条线，因此漂亮的打印可能不如使用树时漂亮（或者可以，但是可以）。复杂）。

In python, there are 3 event-driven parsers that come with the standard library (in no particular order): 在python中，标准库附带了3个事件驱动的解析器（没有特定顺序）：

ElementTree.iterparse() ElementTree.iterparse（）
pyexpat pyexpat
sax (SAX is a well-known event-driven XML parsing API) sax （SAX是著名的事件驱动的XML解析API）