Parsing XML data with python - how to capture everything in a more pythonic way?

Question

The aim is to capture all data in an xml file. Once captured, im comparing it to a reference xml file to ensure nothing has changed and to then tell you what the differences are.

What i wrote works for what I need, but is very cumbersome and a bit messy? Is there a better way to iterate through all items at all depths of an xml file. The solution just has to be robust to capture everything.

Currently, iterating like I have below uses too may layers of iteration with try/except that is very ugly!

import xml.etree.ElementTree as ET

def xml_iter(file):
    
    tree = ET.parse(file)
    root = tree.getroot()
    
    namespaces = {}

    List = []
    Parent = []
    for elem in root:
        for i in elem:
            try:
                i = i.text.strip()
                List.append(i)
            except:
                pass
     
            for j in i:
                try:
                    j = j.text.strip()
                    List.append(j)
                except:
                    pass
  
                for k in j:
                    try:
                        k = k.text.strip()
                        List.append(k)
                    except:
                        pass
    return (List)

Any help would be greatly appreciated.

Answer 1

User Element.iter . It iterates recursively over all the sub-trees.

For your case, it would be something like:

dict_list = []
text_list = []
for node in root.iter():
    dict_list.append(node.attrib) # adds to list, the dictionary of attrib
    text_list.append(node.text)

 

# do the same for other file and compare dictionaries/strings in corresponding lists.

You can look at this official tutorial for examples.

Parsing XML data with python - how to capture everything in a more pythonic way?

Question

1 answers

solution1
2 ACCPTED 2020-11-25 15:11:07

Parsing XML data with python - how to capture everything in a more pythonic way?

Question

1 answers

solution1 2 ACCPTED 2020-11-25 15:11:07

solution1
2 ACCPTED 2020-11-25 15:11:07