Parse multiple XML although not all files have the elements with python

Question

this is my code, and i have some problem..

from xml.dom import minidom
openFiles = 'myxml/*.xml'

list = []

for xmlfiles in glob.glob(openFiles):
    doc = minidom.parse(xmlfiles)
    root = doc.getElementsByTagName("info")[0]
    project_name = root.getAttribute('project_name')
    list.append(project_name)
    ....

This code is working properly. But somehow because I run to open multiple files its error. The cause is because not all files that have 'info' elements. So is there a way to make it keep run and change it to 'none'?

So example become like this

project1, project2, none, project3, none

Sorry for my bad english, and thank you in advance

Answer 1

You could add the try...except handling according to the common Python coding principle EAFP (Easier to Ask Forgiveness than Permission) and if there are no 'info' element in XML, exception will be handled and None will be added to the list:

for xmlfiles in glob.glob(openFiles):
    doc = minidom.parse(xmlfiles)
    root = None
    try:
        root = doc.getElementsByTagName('info')[0]
    except IndexError:
        list.append(None)
    if root:
        project_name = root.getAttribute('project_name')       
        list.append(project_name)
    ....

Or you could use LBYL (Look before you leap) coding principle and check that 'info' is in XML before getting its attribute:

for xmlfiles in glob.glob(openFiles):
    doc = minidom.parse(xmlfiles)
    if len(doc.getElementsByTagName('info')):
        root = doc.getElementsByTagName('info')[0]
        project_name = root.getAttribute('project_name')       
        list.append(project_name)
    else:
        list.append(None)

Parse multiple XML although not all files have the elements with python

Question

1 answers

solution1
1 ACCPTED 2015-06-17 15:47:56

Parse multiple XML although not all files have the elements with python

Question

1 answers

solution1 1 ACCPTED 2015-06-17 15:47:56

solution1
1 ACCPTED 2015-06-17 15:47:56