[英]Null Text Node issue with xml.dom.minidom in Python
环境:Python 2.6.5,Eclipse SDK 3.7.1,Pydev 2.3
我正在尝试使用xml.dom.minidom
在Python中解析和更改XML数据中的值,而空白文本节点存在问题。
当我将XML文件解析为DOM对象,然后使用toxml()
将其转换回字符串时,所有空白文本节点后面的“ Description”标记都被弄乱了。
有谁知道问题是什么?
issue.py的内容
from xml.dom import minidom
xml_dom_object = minidom.parse('news_shows.xml')
main_node = xml_dom_object.getElementsByTagName('NewsShows')[0]
xml_string = main_node.toxml()
print xml_string
news_shows.xml的内容 (注意两个空白的Text节点) :
<NewsShows Planet="Earth" Language="English" Year="2012">
<NewsShow ShowName="The_Young_Turks">
<Description Detail="Best_show_of_all_time_according_to_many">True</Description>
<Description Detail="The_only_source_of_truth"></Description>
<Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"></Description>
</NewsShow>
</NewsShows>
脚本的输出 (注意两个混乱的“ Description”标签) :
<NewsShows Language="English" Planet="Earth" Year="2012">
<NewsShow ShowName="The_Young_Turks">
<Description Detail="Best_show_of_all_time_according_to_many">True</Description>
<Description Detail="The_only_source_of_truth"/>
<Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"/>
</NewsShow>
下面是源代码“ python-3.2.3.amd64 \\ Lib \\ xml \\ dom \\ minidom.py”中的代码段。
def writexml(self, writer, indent="", addindent="", newl=""):
# indent = current indentation
# addindent = indentation to add to higher levels
# newl = newline string
writer.write(indent+"<" + self.tagName)
attrs = self._get_attributes()
a_names = sorted(attrs.keys())
for a_name in a_names:
writer.write(" %s=\"" % a_name)
_write_data(writer, attrs[a_name].value)
writer.write("\"")
if self.childNodes:
writer.write(">")
if (len(self.childNodes) == 1 and
self.childNodes[0].nodeType == Node.TEXT_NODE):
self.childNodes[0].writexml(writer, '', '', '')
else:
writer.write(newl)
for node in self.childNodes:
node.writexml(writer, indent+addindent, addindent, newl)
writer.write(indent)
writer.write("</%s>%s" % (self.tagName, newl))
else:
writer.write("/>%s"%(newl))
根据该功能,如果“ self”变量(即要写入XML的节点)没有“ childNodes”,则编写器将编写一个自闭标签。
这实际上是在某处引起问题吗? 根据我对xml的了解,字符串<tag></tag>
和<tag/>
是等效的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.