简体   繁体   中英

parse xml files with python 3

Let's assume that i have couple of xml files. Assume that the first file is the base and the next files are the overrides (like update to the base file). I want to write a program that get lists of files (updates) and create a final xml with all the data. I succeed to read each file but i don't know how to combine them together.

The original xml:

<Base>
<Module ID = "Module1"
    Prop1 = "A"
    Prop2 = "B"
    Prop3 = "C"
/>
<!-- XML comment -->
<Module ID = "Module2"
    Prop1 = "D"
    Prop2 = "E"
    Prop3 = "F"
/>  
</Base>

An update:

<!-- XML comment -->
<Override>
<Module ID = "Module1"
    Prop2 = "B_ov"
    Prop4 = "ZZ"
/>
<!-- XML comment -->
<Module ID = "Module2"
    Prop1 = "D_ov"
    Prop5 = "F"
/>  
</Override>

Final xml file should look like:

 <!-- XML comment -->
<final>
<Module ID = "Module1"
    Prop1 = "A"
    Prop2 = "B_ov"
    Prop3 = "C"
    Prop4 = "ZZ"
/>
<!-- XML comment -->
<Module ID = "Module2"
    Prop1 = "D_ov"
    Prop2 = "E"
    Prop3 = "F"
    Prop5 = "F"
/>  
</final>

The code:

from argparse import ArgumentParser
from xml.etree import ElementTree

def main():
parser = ArgumentParser()
parser.add_argument('xml', nargs='+')

a=parse_xml("Base.xml")
print (a)

b= parse_xml("Override.xml")
print (b)

def parse_xml(path):
    return {m.attrib.pop('ID'): m.attrib for m in ElementTree.parse(path).findall('Module')}

if __name__ == '__main__':
    main()

Extended solution (without <!-- XML comment --> items):

import xml.etree.ElementTree as ET

base_tree = ET.parse('Base.xml')
base_root = base_tree.getroot()
override = ET.parse('Override.xml').getroot()

base_root.tag = 'final'   # set new `root` tag

for m in base_root.findall('Module[@ID]'):

    # finding the `overridden` Module element with respective `ID`
    repl_el = override.find('Module[@ID="{}"]'.format(m.get('ID')))
    base_attrs = dict(m.items())
    base_attrs.update(repl_el.items())
    for k,v in base_attrs.items():
        m.set(k, v)

print(base_tree.write('output.xml', encoding='unicode'))

The final output.xml contents:

<final>
<Module ID="Module1" Prop1="A" Prop2="B_ov" Prop3="C" Prop4="ZZ" />

<Module ID="Module2" Prop1="D_ov" Prop2="E" Prop3="F" Prop5="F" />
</final>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM