[英]merging two xml files and appending elements for similar elements and moving elements that aren't present in one file in python
我想合并两个XML文件。 我阅读了许多解决方案,但是它们特定于那些文件。 我正在使用xml.etree.ElementTree
以及lxml
进行解析,比较文件,获取差异。 我了解下一步是:
for element in file2.xml:
if element present in file1.xml:
append to output_file.xml
else:
copy element to the output_file
但是我在XML方面工作不多,并且合并的工具已获得许可,因此我需要编写一个通用脚本以合并为所需的格式。
file1.xml
: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>
<great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
<grandpa>
<grandpa_name>grandpa_name_one_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name>grandpa_name_two_1</grandpa_name>
</grandpa>
<grandma>
<grandma_name>grandma_name_one_1</grandma_name>
</grandma>
<grandma>
<grandma_name>grandma_name_two_1</grandma_name>
</grandma>
</great_grands>
file2.xml
: <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>
<great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>
<grandpa>
<grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
</grandpa>
<grandma>
<grandma_name_2>grandma_name_one_2</grandma_name_2>
</grandma>
</great_grands>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>
<great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
<great_grandma_name_two>great_grandma_name</great_grandma_name_two>
<grandpa>
<grandpa_name>grandpa_name_one_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name>grandpa_name_two_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
</grandpa>
<grandma>
<grandma_name>grandma_name_one_1</grandma_name>
</grandma>
<grandma>
<grandma_name>grandma_name_two_1</grandma_name>
</grandma>
<grandma>
<grandma_name_2>grandma_name_one_2</grandma_name_2>
</grandma>
</great_grands>
考虑XSLT ,它是专用于XPath的专用声明性语言和同级语言,旨在转换XML文件。 使用其document()
函数,它可以从相对链接的外部XML文件进行解析。 Python的lxml
模块可以处理XSLT 1.0脚本。
而且由于XSLT脚本是格式正确的XML文件,所以您可以从文件或嵌入的字符串中进行解析。 下面假设所有文件和脚本都保存在同一目录中:
XSLT脚本(另存为.xsl脚本,注意仅引用了file2.xml)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="/great_grands">
<xsl:copy>
<xsl:copy-of select="great_grandpa_name_one"/>
<xsl:copy-of select="document('file2.xml')/great_grands/great_grandpa_name_two"/>
<xsl:copy-of select="grandpa"/>
<xsl:copy-of select="document('file2.xml')/great_grands/grandpa"/>
<xsl:copy-of select="grandma"/>
<xsl:copy-of select="document('file2.xml')/great_grands/grandma"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
Python脚本(注意仅引用了file1.xml)
from lxml import etree
xml = etree.parse('file1.xml')
xsl = etree.parse('XSLTScript.xsl')
transform = etree.XSLT(xsl)
newdom = transform(xml)
# SAVE NEW DOM STRING TO FILE
with open('Output.xml', 'wb') as f:
f.write(newdom)
产量
<?xml version="1.0" encoding="UTF-8"?>
<great_grands>
<great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
<great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>
<grandpa>
<grandpa_name>grandpa_name_one_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name>grandpa_name_two_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
</grandpa>
<grandma>
<grandma_name>grandma_name_one_1</grandma_name>
</grandma>
<grandma>
<grandma_name>grandma_name_two_1</grandma_name>
</grandma>
<grandma>
<grandma_name_2>grandma_name_one_2</grandma_name_2>
</grandma>
</great_grands>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.