简体   繁体   English

Python minidom查找空文本节点

[英]Python minidom look for empty text node

I am parsing an XML file with the minidom parser, where I'm iterating over the XML and output specific information that stands between the tags into a dictionary. 我正在使用minidom解析器解析XML文件,在其中迭代XML并将位于标签之间的特定信息输出到字典中。

Like this: 像这样:

d={}
dom = parseString(data)
macro=dom.getElementsByTagName('macro')
for node in macro:
    d={}
    id_name=node.getElementsByTagName('id')[0].toxml()
    id_data=id_name.replace('<id>','').replace('</id>','')
    print (id_data)
    cl_name=node.getElementsByTagName('cl')[1].toxml()
    cl_data=cl_name.replace('<cl>','').replace('</cl>','')
    print (cl_data)
    d_source[id_data]=(cl_data)

Now, my problem is that the data where I'm looking for in cl_name=node.getElementsByTagName('cl')[1].toxml() is sometimes non-existent! 现在,我的问题是在cl_name = node.getElementsByTagName('cl')[1] .toxml()中寻找的数据有时不存在!

In this case the part of the XML looks like this: 在这种情况下,XML的一部分如下所示:

<cl>blabla</cl>
<cl></cl>

Because of this I receive an "index is out of range"-error. 因此,我收到“索引超出范围”错误。 However, I really need this "nothing" in my dictionary. 但是,我的词典中确实不需要此“内容”。 My dictionary should look like this: 我的字典应如下所示:

d={blabla:'',xyz:'abc'}

I have to look for the empty text node, which I tried by doing this: 我必须寻找一个空的文本节点,我尝试这样做:

if node.getElementsByTagName('cl')[1].toxml is None:
    print ('')
else:
    cl_name=node.getElementsByTagName('cl')[1].toxml()
    cl_data=cl_name.replace('<cl>','').replace('</cl>','')
    print (cl_data)
    d_target[id_data]=(cl_data)
    print(d_target)

I still receive that indexing error...I also thought about inserting a white space into the original source file, but am not sure if this would solve the issue. 我仍然收到该索引错误...我还考虑过在原始源文件中插入空格,但是不确定是否可以解决该问题。 Any ideas? 有任何想法吗?

If the minidom is not dictated somehow, I suggest to change your mind and use the standard xml.etree.ElementTree. 如果没有以某种方式决定最小化,我建议您改变主意,并使用标准的xml.etree.ElementTree。 It is much easier. 这要容易得多。

I figured out it's working when adding a white space into the original source file. 我发现在原始源文件中添加空格时它可以正常工作。 This looks a bit messy though. 不过,这看起来有点混乱。 So if anyone has a better idea, I'm looking forward to it! 因此,如果有人有更好的主意,我很期待!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM