简体   繁体   English

Python中xml.dom.minidom的空文本节点问题

[英]Null Text Node issue with xml.dom.minidom in Python

Environment: Python 2.6.5, Eclipse SDK 3.7.1, Pydev 2.3 环境:Python 2.6.5,Eclipse SDK 3.7.1,Pydev 2.3

I am trying to parse and change values in XML data in Python using xml.dom.minidom and I'm having an issue with blank text nodes. 我正在尝试使用xml.dom.minidom在Python中解析和更改XML数据中的值,而空白文本节点存在问题。

When I parse an XML file into a DOM object and then convert it back to a string using toxml() , the closing "Description" tags get messed up after all the blank text nodes. 当我将XML文件解析为DOM对象,然后使用toxml()将其转换回字符串时,所有空白文本节点后面的“ Description”标记都被弄乱了。

Does anyone know the problem is? 有谁知道问题是什么?

Contents of issue.py issue.py的内容

from xml.dom import minidom  
xml_dom_object = minidom.parse('news_shows.xml')  
main_node = xml_dom_object.getElementsByTagName('NewsShows')[0]  
xml_string = main_node.toxml()  
print xml_string  

Contents of news_shows.xml (notice the two blank Text nodes) : news_shows.xml的内容 (注意两个空白的Text节点)

<NewsShows Planet="Earth" Language="English" Year="2012">
<NewsShow ShowName="The_Young_Turks">
 <Description Detail="Best_show_of_all_time_according_to_many">True</Description>
 <Description Detail="The_only_source_of_truth"></Description>
 <Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"></Description>
</NewsShow>
</NewsShows>

Output of the script (notice the 2 "Description" tags that are messed up) : 脚本的输出 (注意两个混乱的“ Description”标签)

<NewsShows Language="English" Planet="Earth" Year="2012">
<NewsShow ShowName="The_Young_Turks">
 <Description Detail="Best_show_of_all_time_according_to_many">True</Description>
 <Description Detail="The_only_source_of_truth"/>
 <Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"/>
</NewsShow>

Below is code snippet from source "python-3.2.3.amd64\\Lib\\xml\\dom\\minidom.py". 下面是源代码“ python-3.2.3.amd64 \\ Lib \\ xml \\ dom \\ minidom.py”中的代码段。

def writexml(self, writer, indent="", addindent="", newl=""):
    # indent = current indentation
    # addindent = indentation to add to higher levels
    # newl = newline string
    writer.write(indent+"<" + self.tagName)

    attrs = self._get_attributes()
    a_names = sorted(attrs.keys())

    for a_name in a_names:
        writer.write(" %s=\"" % a_name)
        _write_data(writer, attrs[a_name].value)
        writer.write("\"")
    if self.childNodes:
        writer.write(">")
        if (len(self.childNodes) == 1 and
            self.childNodes[0].nodeType == Node.TEXT_NODE):
            self.childNodes[0].writexml(writer, '', '', '')
        else:
            writer.write(newl)
            for node in self.childNodes:
                node.writexml(writer, indent+addindent, addindent, newl)
            writer.write(indent)
        writer.write("</%s>%s" % (self.tagName, newl))
    else:
        writer.write("/>%s"%(newl))

According to the function, if the "self" variable (which is the node to be written into XML) has no "childNodes", the writer will write a self-closing tag. 根据该功能,如果“ self”变量(即要写入XML的节点)没有“ childNodes”,则编写器将编写一个自闭标签。

Is this actually causing a problem somewhere? 这实际上是在某处引起问题吗? From all that I know about xml, the strings <tag></tag> and <tag/> are equivalent. 根据我对xml的了解,字符串<tag></tag><tag/>是等效的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM