简体   繁体   English

从Python minidom XML获取标签列表

[英]Getting list of tags from Python minidom XML

I have a fairly simple XML structure that has a certain degree of variability, so I'd like to simplify writing my parser for it. 我有一个相当简单的XML结构,具有一定程度的可变性,所以我想简化为它编写解析器的过程。 Right now the xml looks similar to this: 现在,xml看起来与此类似:

<items>
    <item>
        <Tag1>Some Value</Tag1>
        <Tag2>Some Value</Tag1>
        <Tag3>Some Value</Tag1>
    </item>
</items>

I've figured out how to properly get "Some Value" out of the tags and into my data dict, but I don't necessarily know all of the tags before hand that may or may not be present. 我已经弄清楚了如何正确地从标签中获取“某些价值”并放入我的数据字典中,但是我并不一定要事先知道可能存在或可能不存在的所有标签。 I'd like to iterate over everything in the item class and grab the tag as a value, and the value a separate value. 我想遍历item类中的所有内容,并将标记作为值,并将值作为单独的值。

Right now my code looks like this: 现在,我的代码如下所示:

from xml.dom import minidom
from collections import defaultdict

project = defaultdict(list)

xml_file = minidom.parse(sys.argv[1])


for value in xml_file.getElementsByTagName("Tag1"):
    project['Tag1'].append(xml_file.getElementsByTagName("Tag1")[0].firstChild.data)
for value in xml_file.getElementsByTagName("Tag2"):
    project['Tag2'].append(xml_file.getElementsByTagName("Tag2")[0].firstChild.data)

print project.items()

The reason for the "for value" loops is because I may have tags multiple times in this context and I want all of them. “ for value”循环的原因是因为在这种情况下我可能有多次标记,并且我希望所有这些标记。 I'd love to have something like 我很想吃点类似的东西

for tag in item:
    for value in xml_file.getElementsByTagName(tag):
        project[tag].append(xml_file.getElementsByTagName(tag)[0].firstChild.data)

That way if I have 40 different tags I a) don't have to write 80 lines of code (laziness) and b) can handle dynamic output in the translator if the XML adds/subtracts tags in the future as I don't control the source but I know what it is capable of. 这样,如果我有40个不同的标签,我a)不必编写80行代码(懒惰),并且b)如果将来由于XML加/减标签而导致我无法控制,则可以处理翻译程序中的动态输出源,但我知道它的功能。

Yes, you can take the tags to search for from a list or some other source. 是的,您可以使用标签从列表或其他来源中搜索。 When you do - 当你做-

xml_file.getElementsByTagName(tag)

Python just wants tag to be a string, it does not have to be a direct literal string, you can have those strings read from a file and stored in a list, or directly stored in a list, or got from some other source. Python只是希望tag是一个字符串,它不一定是直接文字字符串,您可以将这些字符串从文件中读取并存储在列表中,或直接存储在列表中,或从其他来源获取。

Also, one more thing , the way you are getting the value to add to project[tag] is wrong, it will always only add the first elements value. 另外,还有一件事情,您获取要添加到project[tag]的值的方法是错误的,它将始终只添加第一个元素的值。 You should just do - value.firstChild.data to get the value. 您应该只执行value.firstChild.data以获取值。 Example - 范例-

items = ['Tag1','Tag2']
for tag in items:
    for value in xml_file.getElementsByTagName(tag):
        project[tag].append(value.firstChild.data)

If what you want is to get all element nodes inside item , without knowing the tagName beforehand, then Element object from xml.dom has an attribute tagName to get the tag for that element. 如果您要获取的是item内的所有元素节点,而无需事先知道tagName,则xml.dom中的Element对象具有一个tagName属性来获取该元素的标签。 You can use something like below - 您可以使用如下所示的内容-

from xml.dom.minidom import Node
for elem in root.getElementsByTagName('item'):
    for x in elem.childNodes:
        if x.nodeType == Node.ELEMENT_NODE:
            project[x.tagName].append(x.firstChild.data)

Example/Demo - 示例/演示-

>>> import xml.dom.minidom as md
>>> s = """<items>
...     <item>
...         <Tag1>Some Value</Tag1>
...         <Tag2>Some Value</Tag1>
...         <Tag3>Some Value</Tag1>
...     </item>
... </items>"""
>>> root = md.parseString(s)
>>> from xml.dom.minidom import Node
>>> for elem in root.getElementsByTagName('item'):
...     for x in elem.childNodes:
...             if x.nodeType == Node.ELEMENT_NODE:
...                     print(x.tagName, x.childNodes[0].data)
...
Tag1 Some Value
Tag2 Some Value
Tag3 Some Value

One more way is to use https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree 另一种方法是使用https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

from xml.etree import ElementTree as ET

xml_tree = ET.fromstring(sys.argv[1])

for item in xml_tree:
    for t in item:
        #here t is s tag under item. You can have multiple tags
        project[t.tag].append(t.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM