简体   繁体   中英

Getting list of tags from Python minidom XML

I have a fairly simple XML structure that has a certain degree of variability, so I'd like to simplify writing my parser for it. Right now the xml looks similar to this:

<items>
    <item>
        <Tag1>Some Value</Tag1>
        <Tag2>Some Value</Tag1>
        <Tag3>Some Value</Tag1>
    </item>
</items>

I've figured out how to properly get "Some Value" out of the tags and into my data dict, but I don't necessarily know all of the tags before hand that may or may not be present. I'd like to iterate over everything in the item class and grab the tag as a value, and the value a separate value.

Right now my code looks like this:

from xml.dom import minidom
from collections import defaultdict

project = defaultdict(list)

xml_file = minidom.parse(sys.argv[1])


for value in xml_file.getElementsByTagName("Tag1"):
    project['Tag1'].append(xml_file.getElementsByTagName("Tag1")[0].firstChild.data)
for value in xml_file.getElementsByTagName("Tag2"):
    project['Tag2'].append(xml_file.getElementsByTagName("Tag2")[0].firstChild.data)

print project.items()

The reason for the "for value" loops is because I may have tags multiple times in this context and I want all of them. I'd love to have something like

for tag in item:
    for value in xml_file.getElementsByTagName(tag):
        project[tag].append(xml_file.getElementsByTagName(tag)[0].firstChild.data)

That way if I have 40 different tags I a) don't have to write 80 lines of code (laziness) and b) can handle dynamic output in the translator if the XML adds/subtracts tags in the future as I don't control the source but I know what it is capable of.

Yes, you can take the tags to search for from a list or some other source. When you do -

xml_file.getElementsByTagName(tag)

Python just wants tag to be a string, it does not have to be a direct literal string, you can have those strings read from a file and stored in a list, or directly stored in a list, or got from some other source.

Also, one more thing , the way you are getting the value to add to project[tag] is wrong, it will always only add the first elements value. You should just do - value.firstChild.data to get the value. Example -

items = ['Tag1','Tag2']
for tag in items:
    for value in xml_file.getElementsByTagName(tag):
        project[tag].append(value.firstChild.data)

If what you want is to get all element nodes inside item , without knowing the tagName beforehand, then Element object from xml.dom has an attribute tagName to get the tag for that element. You can use something like below -

from xml.dom.minidom import Node
for elem in root.getElementsByTagName('item'):
    for x in elem.childNodes:
        if x.nodeType == Node.ELEMENT_NODE:
            project[x.tagName].append(x.firstChild.data)

Example/Demo -

>>> import xml.dom.minidom as md
>>> s = """<items>
...     <item>
...         <Tag1>Some Value</Tag1>
...         <Tag2>Some Value</Tag1>
...         <Tag3>Some Value</Tag1>
...     </item>
... </items>"""
>>> root = md.parseString(s)
>>> from xml.dom.minidom import Node
>>> for elem in root.getElementsByTagName('item'):
...     for x in elem.childNodes:
...             if x.nodeType == Node.ELEMENT_NODE:
...                     print(x.tagName, x.childNodes[0].data)
...
Tag1 Some Value
Tag2 Some Value
Tag3 Some Value

One more way is to use https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

from xml.etree import ElementTree as ET

xml_tree = ET.fromstring(sys.argv[1])

for item in xml_tree:
    for t in item:
        #here t is s tag under item. You can have multiple tags
        project[t.tag].append(t.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM