[英]Getting list of tags from Python minidom XML
I have a fairly simple XML structure that has a certain degree of variability, so I'd like to simplify writing my parser for it. 我有一个相当简单的XML结构,具有一定程度的可变性,所以我想简化为它编写解析器的过程。 Right now the xml looks similar to this:
现在,xml看起来与此类似:
<items>
<item>
<Tag1>Some Value</Tag1>
<Tag2>Some Value</Tag1>
<Tag3>Some Value</Tag1>
</item>
</items>
I've figured out how to properly get "Some Value" out of the tags and into my data dict, but I don't necessarily know all of the tags before hand that may or may not be present. 我已经弄清楚了如何正确地从标签中获取“某些价值”并放入我的数据字典中,但是我并不一定要事先知道可能存在或可能不存在的所有标签。 I'd like to iterate over everything in the item class and grab the tag as a value, and the value a separate value.
我想遍历item类中的所有内容,并将标记作为值,并将值作为单独的值。
Right now my code looks like this: 现在,我的代码如下所示:
from xml.dom import minidom
from collections import defaultdict
project = defaultdict(list)
xml_file = minidom.parse(sys.argv[1])
for value in xml_file.getElementsByTagName("Tag1"):
project['Tag1'].append(xml_file.getElementsByTagName("Tag1")[0].firstChild.data)
for value in xml_file.getElementsByTagName("Tag2"):
project['Tag2'].append(xml_file.getElementsByTagName("Tag2")[0].firstChild.data)
print project.items()
The reason for the "for value" loops is because I may have tags multiple times in this context and I want all of them. “ for value”循环的原因是因为在这种情况下我可能有多次标记,并且我希望所有这些标记。 I'd love to have something like
我很想吃点类似的东西
for tag in item:
for value in xml_file.getElementsByTagName(tag):
project[tag].append(xml_file.getElementsByTagName(tag)[0].firstChild.data)
That way if I have 40 different tags I a) don't have to write 80 lines of code (laziness) and b) can handle dynamic output in the translator if the XML adds/subtracts tags in the future as I don't control the source but I know what it is capable of. 这样,如果我有40个不同的标签,我a)不必编写80行代码(懒惰),并且b)如果将来由于XML加/减标签而导致我无法控制,则可以处理翻译程序中的动态输出源,但我知道它的功能。
Yes, you can take the tags to search for from a list or some other source. 是的,您可以使用标签从列表或其他来源中搜索。 When you do -
当你做-
xml_file.getElementsByTagName(tag)
Python just wants tag
to be a string, it does not have to be a direct literal string, you can have those strings read from a file and stored in a list, or directly stored in a list, or got from some other source. Python只是希望
tag
是一个字符串,它不一定是直接文字字符串,您可以将这些字符串从文件中读取并存储在列表中,或直接存储在列表中,或从其他来源获取。
Also, one more thing , the way you are getting the value to add to project[tag]
is wrong, it will always only add the first elements value. 另外,还有一件事情,您获取要添加到
project[tag]
的值的方法是错误的,它将始终只添加第一个元素的值。 You should just do - value.firstChild.data
to get the value. 您应该只执行
value.firstChild.data
以获取值。 Example - 范例-
items = ['Tag1','Tag2']
for tag in items:
for value in xml_file.getElementsByTagName(tag):
project[tag].append(value.firstChild.data)
If what you want is to get all element nodes inside item
, without knowing the tagName beforehand, then Element
object from xml.dom
has an attribute tagName
to get the tag for that element. 如果您要获取的是
item
内的所有元素节点,而无需事先知道tagName,则xml.dom
中的Element
对象具有一个tagName
属性来获取该元素的标签。 You can use something like below - 您可以使用如下所示的内容-
from xml.dom.minidom import Node
for elem in root.getElementsByTagName('item'):
for x in elem.childNodes:
if x.nodeType == Node.ELEMENT_NODE:
project[x.tagName].append(x.firstChild.data)
Example/Demo - 示例/演示-
>>> import xml.dom.minidom as md
>>> s = """<items>
... <item>
... <Tag1>Some Value</Tag1>
... <Tag2>Some Value</Tag1>
... <Tag3>Some Value</Tag1>
... </item>
... </items>"""
>>> root = md.parseString(s)
>>> from xml.dom.minidom import Node
>>> for elem in root.getElementsByTagName('item'):
... for x in elem.childNodes:
... if x.nodeType == Node.ELEMENT_NODE:
... print(x.tagName, x.childNodes[0].data)
...
Tag1 Some Value
Tag2 Some Value
Tag3 Some Value
One more way is to use https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree 另一种方法是使用https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree
from xml.etree import ElementTree as ET
xml_tree = ET.fromstring(sys.argv[1])
for item in xml_tree:
for t in item:
#here t is s tag under item. You can have multiple tags
project[t.tag].append(t.text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.