[英]Finding Tags within an XML file with Python
I need some help in my python code for handling an XML file.我在处理 XML 文件的 python 代码中需要一些帮助。 I want to get subtags and store them in lists and do some stuff with them.
我想获取子标签并将它们存储在列表中并用它们做一些事情。 Until now my code was working because I was thinking that the XML structure is the same for every file i had.
直到现在我的代码都在工作,因为我认为 XML 结构对于我拥有的每个文件都是相同的。 so I used ElementTree library for parsing etc, then .findall(tagname) and after that I did some stuff with the lists.
所以我使用 ElementTree 库进行解析等,然后使用 .findall(tagname) ,之后我对列表做了一些事情。 But then I realized that some files have more tags and because of that I don't get everything i need.
但后来我意识到有些文件有更多的标签,因此我没有得到我需要的一切。 To give you an idea,
给你一个思路,
<parent tag (same for every file)>
<tag1>
.....
</tag1>
<tag2>
.....
</tag2>
<tag3>
.....
</tag3>
<unknown tag1>
.....
</unknown tag1>
<unknown tag2>
.....
</unknown tag2>
<tag2>
.....
</tag2>
<tag2>
.....
</tag2>
<unknown tag1>
.....
</unknown tag1>
</parent tag>
So my current code is:所以我目前的代码是:
list1 = root.findall('tag1')
list2 = root.findall('tag2')
list3 = root.findall('tag3')
and then I do something for what is inside those tags which is working.然后我为那些正在工作的标签中的内容做一些事情。 I need help on how to detect every tag under parent tag, and then store them in a list so i can do the findall() funtion for each tag in the list.
我需要关于如何检测父标签下的每个标签的帮助,然后将它们存储在列表中,以便我可以为列表中的每个标签执行 findall() 功能。 Something like
就像是
List_of_tags = [tag1, tag2, tag3, unknown tag1, etc]
for tag in list_of_tags:
....
Thank you in advance!先感谢您!
I actually parse xml files with ElemntTree like that:我实际上使用 ElemntTree 解析 xml 文件,如下所示:
try:
tree = ET.parse(filename)
except IOError as e:
print 'No such file or directory'
else:
root = tree.getroot()
You can use xmltodict您可以使用xmltodict
pip install xmltodict
And here's how you can get all the child tags under a parent tag以下是如何获取父标签下的所有子标签
import xmltodict
my_xml = """<parent_tag>
<tag1>
.....
</tag1>
<tag2>
.....
</tag2>
<tag3>
.....
</tag3>
<unknown_tag1>
.....
</unknown_tag1>
<unknown_tag2>
.....
</unknown_tag2>
<tag2>
.....
</tag2>
<tag2>
.....
</tag2>
<unknown_tag1>
.....
</unknown_tag1>
</parent_tag>"""
xmld = xmltodict.parse(my_xml)
child_tags = xmld['parent_tag'].keys()
for child_tag in child_tags:
print(child_tag)
The output will look like this:输出将如下所示:
tag1
tag2
tag3
unknown_tag1
unknown_tag2
----- SOLUTION ----- - - - 解决方案 - - -
child_tags = root.getchildren()
for child in child_tags:
k = child.tag
tags.append(k)
for tag in tags:
list1 = root.findall(tag)
tagslist = tagslist + list1
#remove duplicates
tagslist = list(dict.fromkeys(tagslist))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.