繁体   English   中英

合并两个xml文件,并为python中的一个文件中不存在的相似元素和移动元素附加元素

[英]merging two xml files and appending elements for similar elements and moving elements that aren't present in one file in python

我想合并两个XML文件。 我阅读了许多解决方案,但是它们特定于那些文件。 我正在使用xml.etree.ElementTree以及lxml进行解析,比较文件,获取差异。 我了解下一步是:

for element in file2.xml:
    if element present in file1.xml:
        append to output_file.xml
    else:
        copy element to the output_file

但是我在XML方面工作不多,并且合并的工具已获得许可,因此我需要编写一个通用脚本以合并为所需的格式。

file1.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>

    <grandpa>
        <grandpa_name>grandpa_name_one_1</grandpa_name>
    </grandpa>
    <grandpa>
        <grandpa_name>grandpa_name_two_1</grandpa_name>
    </grandpa>

    <grandma>
        <grandma_name>grandma_name_one_1</grandma_name>
    </grandma>
    <grandma>
        <grandma_name>grandma_name_two_1</grandma_name>
    </grandma>

</great_grands>

file2.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>


    <grandpa>
        <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
    </grandpa>

    <grandma>
        <grandma_name_2>grandma_name_one_2</grandma_name_2>
    </grandma>

</great_grands>

要求的输出:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
    <great_grandma_name_two>great_grandma_name</great_grandma_name_two>

    <grandpa>
        <grandpa_name>grandpa_name_one_1</grandpa_name>
    </grandpa>
    <grandpa>
        <grandpa_name>grandpa_name_two_1</grandpa_name>
    </grandpa>

    <grandpa>
        <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
    </grandpa>

    <grandma>
        <grandma_name>grandma_name_one_1</grandma_name>
    </grandma>
    <grandma>
        <grandma_name>grandma_name_two_1</grandma_name>
    </grandma>

    <grandma>
        <grandma_name_2>grandma_name_one_2</grandma_name_2>
    </grandma>

</great_grands>

考虑XSLT ,它是专用于XPath的专用声明性语言和同级语言,旨在转换XML文件。 使用其document()函数,它可以从相对链接的外部XML文件进行解析。 Python的lxml模块可以处理XSLT 1.0脚本。

而且由于XSLT脚本是格式正确的XML文件,所以您可以从文件或嵌入的字符串中进行解析。 下面假设所有文件和脚本都保存在同一目录中:

XSLT脚本(另存为.xsl脚本,注意仅引用了file2.xml)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

 <xsl:template match="/great_grands">
   <xsl:copy>
     <xsl:copy-of select="great_grandpa_name_one"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/great_grandpa_name_two"/>
     <xsl:copy-of select="grandpa"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/grandpa"/>
     <xsl:copy-of select="grandma"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/grandma"/>
   </xsl:copy>
 </xsl:template>

</xsl:transform>

Python脚本(注意仅引用了file1.xml)

from lxml import etree

xml = etree.parse('file1.xml')
xsl = etree.parse('XSLTScript.xsl')

transform = etree.XSLT(xsl)
newdom = transform(xml)

# SAVE NEW DOM STRING TO FILE
with open('Output.xml', 'wb') as f:
   f.write(newdom)

产量

<?xml version="1.0" encoding="UTF-8"?>
<great_grands>
  <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
  <great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>
  <grandpa>
    <grandpa_name>grandpa_name_one_1</grandpa_name>
  </grandpa>
  <grandpa>
    <grandpa_name>grandpa_name_two_1</grandpa_name>
  </grandpa>
  <grandpa>
    <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
  </grandpa>
  <grandma>
    <grandma_name>grandma_name_one_1</grandma_name>
  </grandma>
  <grandma>
    <grandma_name>grandma_name_two_1</grandma_name>
  </grandma>
  <grandma>
    <grandma_name_2>grandma_name_one_2</grandma_name_2>
  </grandma>
</great_grands>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM