简体   繁体   English

为什么python中的元素树在XML中添加了额外的新行和空格

[英]why Element tree in python adds extra new lines and spaces in XML

how can i change the appearance of my xml from eg 我怎样才能改变我的xml的外观

 <root>
     <elem1>
         <value>
            122
         </value>
         <text>
            This_is_just_a_text
         </text>
     </elem1>
     <elem1>
         <value>
            122
         </value>
         <text>
            This_is_just_a_text
         </text>
     </elem1>   
 </root>

to something look like: 看起来像:

 <root>
     <elem1>
         <value>122</value>
         <text>This_is_just_a_text</text>
     </elem1>
     <elem1>
         <value>122</value>
         <text>This_is_just_a_text</text>
     </elem1>   
 </root>

I'm just wondering what cause that to happen? 我只是想知道是什么原因发生的? and by the way the below method/function is used to add the indents! 顺便说一句,下面的方法/函数用于添加缩进!

 def prettify(elem):
     """
         Return a pretty-printed XML string for the Element.
     """
     rough_string = ET.tostring(elem, 'utf-8')
     reparsed = minidom.parseString(rough_string)
     return reparsed.toprettyxml(indent="\t")

An Element holds its contained text in a regular str , so you can invoke str.strip() to get rid of the unwanted whitespace. Element将其包含的文本保存在常规str ,因此您可以调用str.strip()来删除不需要的空格。

import xml.etree.ElementTree as ET
import xml.dom.minidom as minidom

def prettify(elem):
     """
         Return a pretty-printed XML string for the Element.
     """
     rough_string = ET.tostring(elem, 'utf-8')
     reparsed = minidom.parseString(rough_string)
     return reparsed.toprettyxml(indent="\t")

def strip(elem):
    for elem in elem.iter():
        if(elem.text):
            elem.text = elem.text.strip()
        if(elem.tail):
            elem.tail = elem.tail.strip()

xml = ET.XML('''<elem1>
         <value>
            122
         </value>
         <text>
            This_is_just_a_text
         </text>
     </elem1>''')

strip(xml)
print prettify(xml)

Result: 结果:

<?xml version="1.0" ?>
<elem1>
    <value>122</value>
    <text>This_is_just_a_text</text>
</elem1>

I'm writing this answer just for those who might have the same problem one day. 我正在为那些可能有一天会遇到同样问题的人写这个答案。

here what i found out! 在这里我发现了! there actually was a bug in the built in method toprettyxml() for all python versions before python2.7.3 this bug caused the addition of redundant spaces and new lines in your xml output. 对于python2.7.3之前的所有python版本,内置方法toprettyxml()实际上存在一个错误,这个错误导致在xml输出中添加了冗余空格和新行。 so if you have python 2.7.3 or higher you will be fine using the prettify() method that provided in the question and you shouldnt see any extra lines or spaces but if you are using an older version then here is a way to fix it using "regular expression": 因此,如果你有python 2.7.3或更高版本你可以使用问题中提供的prettify()方法,你不应该看到任何额外的行或空格,但如果你使用的是旧版本,那么这里有一种解决方法使用“正则表达式”:

 def prettify(elem):
     """
         Return a pretty-printed XML string for the Element.
     """
     rough_string = ET.tostring(elem, 'utf-8')
     reparsed = minidom.parseString(rough_string)
     uglyXml = reparsed.toprettyxml(indent="\t")
     pattern = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)
     return pattern.sub('>\g<1></', uglyXml) 

Pretty printing XML in Python 在Python中打印XML

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM