简体   繁体   中英

merging two xml files and appending elements for similar elements and moving elements that aren't present in one file in python

I want to merge two XML files. I read many solutions but they are specific to those files. I am using xml.etree.ElementTree as well as lxml for parsing, comparing the files, getting the differences. I understand my next step is:

for element in file2.xml:
    if element present in file1.xml:
        append to output_file.xml
    else:
        copy element to the output_file

but I haven't worked much on XML, and the tools to merge are licensed, so I need to write a generic script to merge to the format I want.

file1.xml :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>

    <grandpa>
        <grandpa_name>grandpa_name_one_1</grandpa_name>
    </grandpa>
    <grandpa>
        <grandpa_name>grandpa_name_two_1</grandpa_name>
    </grandpa>

    <grandma>
        <grandma_name>grandma_name_one_1</grandma_name>
    </grandma>
    <grandma>
        <grandma_name>grandma_name_two_1</grandma_name>
    </grandma>

</great_grands>

file2.xml :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>


    <grandpa>
        <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
    </grandpa>

    <grandma>
        <grandma_name_2>grandma_name_one_2</grandma_name_2>
    </grandma>

</great_grands>

Required output:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
    <great_grandma_name_two>great_grandma_name</great_grandma_name_two>

    <grandpa>
        <grandpa_name>grandpa_name_one_1</grandpa_name>
    </grandpa>
    <grandpa>
        <grandpa_name>grandpa_name_two_1</grandpa_name>
    </grandpa>

    <grandpa>
        <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
    </grandpa>

    <grandma>
        <grandma_name>grandma_name_one_1</grandma_name>
    </grandma>
    <grandma>
        <grandma_name>grandma_name_two_1</grandma_name>
    </grandma>

    <grandma>
        <grandma_name_2>grandma_name_one_2</grandma_name_2>
    </grandma>

</great_grands>

Consider XSLT , the special-purpose declarative language and sibling to XPath, designed to transform XML files. Using its document() function, it can parse from external XML files at relative links. Python's lxml module can process XSLT 1.0 scripts.

And because XSLT scripts are well-formed XML files you can parse from file or embedded string. Below assumes all files and scripts are saved in same directory:

XSLT Script (save as .xsl script, notice only file2.xml is referenced)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

 <xsl:template match="/great_grands">
   <xsl:copy>
     <xsl:copy-of select="great_grandpa_name_one"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/great_grandpa_name_two"/>
     <xsl:copy-of select="grandpa"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/grandpa"/>
     <xsl:copy-of select="grandma"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/grandma"/>
   </xsl:copy>
 </xsl:template>

</xsl:transform>

Python Script (notice only file1.xml is referenced)

from lxml import etree

xml = etree.parse('file1.xml')
xsl = etree.parse('XSLTScript.xsl')

transform = etree.XSLT(xsl)
newdom = transform(xml)

# SAVE NEW DOM STRING TO FILE
with open('Output.xml', 'wb') as f:
   f.write(newdom)

Output

<?xml version="1.0" encoding="UTF-8"?>
<great_grands>
  <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
  <great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>
  <grandpa>
    <grandpa_name>grandpa_name_one_1</grandpa_name>
  </grandpa>
  <grandpa>
    <grandpa_name>grandpa_name_two_1</grandpa_name>
  </grandpa>
  <grandpa>
    <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
  </grandpa>
  <grandma>
    <grandma_name>grandma_name_one_1</grandma_name>
  </grandma>
  <grandma>
    <grandma_name>grandma_name_two_1</grandma_name>
  </grandma>
  <grandma>
    <grandma_name_2>grandma_name_one_2</grandma_name_2>
  </grandma>
</great_grands>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM