简体   繁体   English

Python xml minidom。 生成 <text> 一些文字 </text> 元件

[英]Python xml minidom. generate <text>Some text</text> element

I have the following code. 我有以下代码。

from xml.dom.minidom import Document

doc = Document()

root = doc.createElement('root')
doc.appendChild(root)
main = doc.createElement('Text')
root.appendChild(main)

text = doc.createTextNode('Some text here')
main.appendChild(text)

print doc.toprettyxml(indent='\t')

The result is: 结果是:

<?xml version="1.0" ?>
<root>
    <Text>
        Some text here
    </Text>
</root>

This is all fine and dandy, but what if I want the output to look like this? 这一切都很好,但是如果我希望输出看起来像这样呢?

<?xml version="1.0" ?>
<root>
    <Text>Some text here</Text>
</root>

Can this easily be done? 这可以轻松完成吗?

Orjanp... Orjanp ...

Can this easily be done? 这可以轻松完成吗?

It depends what exact rule you want, but generally no, you get little control over pretty-printing. 这取决于你想要的确切规则,但通常不是,你几乎无法控制漂亮的打印。 If you want a specific format you'll usually have to write your own walker. 如果你想要一种特定的格式,你通常需要编写自己的助行器。

The DOM Level 3 LS parameter format-pretty-print in pxdom comes pretty close to your example. pxdom中的DOM Level 3 LS参数格式 - 漂亮打印非常接近您的示例。 Its rule is that if an element contains only a single TextNode, no extra whitespace will be added. 它的规则是,如果一个元素只包含一个TextNode,则不会添加额外的空格。 However it (currently) uses two spaces for an indent rather than four. 然而,它(当前)使用两个空格来缩进而不是四个。

>>> doc= pxdom.parseString('<a><b>c</b></a>')
>>> doc.domConfig.setParameter('format-pretty-print', True)
>>> print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
  <b>c</b>
</a>

(Adjust encoding and output format for whatever type of serialisation you're doing.) (调整您正在进行的任何类型的序列化的编码和输出格式。)

If that's the rule you want, and you can get away with it, you might also be able to monkey-patch minidom's Element.writexml, eg.: 如果这是你想要的规则,并且你可以逃脱它,你也可以修补minidom的Element.writexml,例如:

>>> from xml.dom import minidom
>>> def newwritexml(self, writer, indent= '', addindent= '', newl= ''):
...     if len(self.childNodes)==1 and self.firstChild.nodeType==3:
...         writer.write(indent)
...         self.oldwritexml(writer) # cancel extra whitespace
...         writer.write(newl)
...     else:
...         self.oldwritexml(writer, indent, addindent, newl)
... 
>>> minidom.Element.oldwritexml= minidom.Element.writexml
>>> minidom.Element.writexml= newwritexml

All the usual caveats about the badness of monkey-patching apply. 所有关于猴子修补的不良的常见警告都适用。

I was looking for exactly the same thing, and I came across this post. 我正在寻找完全相同的东西,我遇到了这篇文章。 (the indenting provided in xml.dom.minidom broke another tool that I was using to manipulate the XML, and I needed it to be indented) I tried the accepted solution with a slightly more complex example and this was the result: (xml.dom.minidom中提供的缩进打破了我用来操作XML的另一个工具,我需要它缩进)我尝试了一个稍微复杂的例子的接受解决方案,这就是结果:

In [1]: import pxdom

In [2]: xml = '<a><b>fda</b><c><b>fdsa</b></c></a>'

In [3]: doc = pxdom.parseString(xml)

In [4]: doc.domConfig.setParameter('format-pretty-print', True)

In [5]: print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
  <b>fda</b><c>
    <b>fdsa</b>
  </c>
</a>

The pretty printed XML isn't formatted correctly, and I'm not too happy about monkey patching (ie I barely know what it means, and understand it's bad), so I looked for another solution. 漂亮的打印XML格式不正确,我对猴子修补不太满意(即我几乎不知道它意味着什么,并且理解它很糟糕),所以我寻找另一个解决方案。

I'm writing the output to file, so I can use the xmlindent program for Ubuntu ($sudo aptitude install xmlindent). 我正在将输出写入文件,因此我可以将xmlindent程序用于Ubuntu($ sudo aptitude install xmlindent)。 So I just write the unformatted to the file, then call the xmlindent from within the python program: 所以我只是将未格式化的文件写入文件,然后在python程序中调用xmlindent:

from subprocess import Popen, PIPE
Popen(["xmlindent", "-i", "2", "-w", "-f", "-nbe", file_name], 
      stderr=PIPE, 
      stdout=PIPE).communicate()

The -w switch causes the file to be overwritten, but annoyingly leaves a named eg "myfile.xml~" which you'll probably want to remove. -w开关导致文件被覆盖,但烦人地留下了一个你可能想要删除的命名例如“myfile.xml~”。 The other switches are there to get the formatting right (for me). 其他开关是为了获得正确的格式(对我而言)。

PS xmlindent is a stream formatter, ie you can use it as follows: PS xmlindent是一个流格式化程序,即您可以按如下方式使用它:

cat myfile.xml | xmlindent > myfile_indented.xml

So you might be able to use it in a python script without writing to a file if you needed to. 因此,如果需要,您可以在python脚本中使用它而无需写入文件。

This could be done with toxml(), using regular expressions to tidy things up. 这可以使用toxml()来完成,使用正则表达式来整理。

>>> from xml.dom.minidom import Document
>>> import re
>>> doc = Document()
>>> root = doc.createElement('root')
>>> _ = doc.appendChild(root)
>>> main = doc.createElement('Text')
>>> _ = root.appendChild(main)
>>> text = doc.createTextNode('Some text here')
>>> _ = main.appendChild(text)
>>> out = doc.toxml()
>>> niceOut = re.sub(r'><', r'>\n<', re.sub(r'(<\/.*?>)', r'\1\n', out))
>>> print niceOut
<?xml version="1.0" ?>
<root>
<Text>Some text here</Text>
</root>

This solution worked for me without monkey patching or ceasing to use minidom: 这个解决方案适用于我没有猴子修补或停止使用minidom:

from xml.dom.ext import PrettyPrint
from StringIO import StringIO

def toprettyxml_fixed (node, encoding='utf-8'):
    tmpStream = StringIO()
    PrettyPrint(node, stream=tmpStream, encoding=encoding)
    return tmpStream.getvalue()

http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/#best-solution http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/#best-solution

Easiest way to do this is to use prettyxml, and remove the \\n and \\t inside the tags. 最简单的方法是使用prettyxml,并删除标签内的\\ n和\\ tt。 That way you keep your indent as you required in your example. 这样你就可以按照例子中的要求保留缩进。

xml_output = doc.toprettyxml() nojunkintags = re.sub('>(\\n|\\t)</', '', xml_output) print nojunkintags

The pyxml package offers a simple solution to this by using the xml.dom.ext.PrettyPrint() function. pyxml包通过使用xml.dom.ext.PrettyPrint()函数为此提供了一个简单的解决方案。 It can also print to a file descriptor. 它还可以打印到文件描述符。

But the pyxml package is no longer maintained. 但不再维护pyxml包。

Oerjan Pettersen Oerjan Pettersen

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM