I have a fairly simple XML structure that has a certain degree of variability, so I'd like to simplify writing my parser for it. Right now the xml looks similar to this:
<items>
<item>
<Tag1>Some Value</Tag1>
<Tag2>Some Value</Tag1>
<Tag3>Some Value</Tag1>
</item>
</items>
I've figured out how to properly get "Some Value" out of the tags and into my data dict, but I don't necessarily know all of the tags before hand that may or may not be present. I'd like to iterate over everything in the item class and grab the tag as a value, and the value a separate value.
Right now my code looks like this:
from xml.dom import minidom
from collections import defaultdict
project = defaultdict(list)
xml_file = minidom.parse(sys.argv[1])
for value in xml_file.getElementsByTagName("Tag1"):
project['Tag1'].append(xml_file.getElementsByTagName("Tag1")[0].firstChild.data)
for value in xml_file.getElementsByTagName("Tag2"):
project['Tag2'].append(xml_file.getElementsByTagName("Tag2")[0].firstChild.data)
print project.items()
The reason for the "for value" loops is because I may have tags multiple times in this context and I want all of them. I'd love to have something like
for tag in item:
for value in xml_file.getElementsByTagName(tag):
project[tag].append(xml_file.getElementsByTagName(tag)[0].firstChild.data)
That way if I have 40 different tags I a) don't have to write 80 lines of code (laziness) and b) can handle dynamic output in the translator if the XML adds/subtracts tags in the future as I don't control the source but I know what it is capable of.
Yes, you can take the tags to search for from a list or some other source. When you do -
xml_file.getElementsByTagName(tag)
Python just wants tag
to be a string, it does not have to be a direct literal string, you can have those strings read from a file and stored in a list, or directly stored in a list, or got from some other source.
Also, one more thing , the way you are getting the value to add to project[tag]
is wrong, it will always only add the first elements value. You should just do - value.firstChild.data
to get the value. Example -
items = ['Tag1','Tag2']
for tag in items:
for value in xml_file.getElementsByTagName(tag):
project[tag].append(value.firstChild.data)
If what you want is to get all element nodes inside item
, without knowing the tagName beforehand, then Element
object from xml.dom
has an attribute tagName
to get the tag for that element. You can use something like below -
from xml.dom.minidom import Node
for elem in root.getElementsByTagName('item'):
for x in elem.childNodes:
if x.nodeType == Node.ELEMENT_NODE:
project[x.tagName].append(x.firstChild.data)
Example/Demo -
>>> import xml.dom.minidom as md
>>> s = """<items>
... <item>
... <Tag1>Some Value</Tag1>
... <Tag2>Some Value</Tag1>
... <Tag3>Some Value</Tag1>
... </item>
... </items>"""
>>> root = md.parseString(s)
>>> from xml.dom.minidom import Node
>>> for elem in root.getElementsByTagName('item'):
... for x in elem.childNodes:
... if x.nodeType == Node.ELEMENT_NODE:
... print(x.tagName, x.childNodes[0].data)
...
Tag1 Some Value
Tag2 Some Value
Tag3 Some Value
One more way is to use https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree
from xml.etree import ElementTree as ET
xml_tree = ET.fromstring(sys.argv[1])
for item in xml_tree:
for t in item:
#here t is s tag under item. You can have multiple tags
project[t.tag].append(t.text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.