简体   繁体   English

XSLT合并2个XML文件

[英]XSLT to Merge 2 XML Files

I know there are few xml/xslt merge related questions here however none seems to solve the problem I have. 我知道这里很少有xml / xslt合并相关的问题,但是似乎没有一个可以解决我的问题。

What I am looking is an XSLT (as generic as possible - not tight with the structure of the input XML files) which can 我正在寻找的是XSLT(尽可能通用-与输入XML文件的结构不紧密),它可以

Merge a.xml with b.xml and generate c.xml such a way that 将a.xml与b.xml合并并生成c.xml

  • c.xml will contain the common nodes between a.xml and b.xml (with the node values taken from a.xml) c.xml将包含a.xml和b.xml之间的公共节点(节点值取自a.xml)
  • in addition c.xml will contain the nodes(and values) which are present in b.xml and not in a.xml 另外,c.xml将包含b.xml中存在的节点(和值),而不是a.xml中存在的节点

For example: merging a.xml : 例如:合并a.xml

<root_node>
  <settings>
    <setting1>a1</setting1>
    <setting2>a2</setting2>
    <setting3>
      <setting31>a3</setting31>
    </setting3>
    <setting4>a4</setting4>
  </settings>
</root_node>

with b.xml : b.xml

<root_node>
  <settings>
    <setting1>b1</setting1>
    <setting2>b2</setting2>
    <setting3>
      <setting31>b3</setting31>
    </setting3>
    <setting5 id="77">b5</setting5>
  </settings>
</root_node>

will generate c.xml : 将生成c.xml

<root_node>
  <settings>
  <setting1>a1</setting1>
  <setting2>a2</setting2>
  <setting3>
    <setting31>a3</setting31>
  </setting3>
  <setting5 id="77">b5</setting5>
</settings>

Additional Information 附加信息

I will try to explain what I understand by a "common node". 我将尝试通过“公共节点”来解释我的理解。 This might not be an accurate xml/xslt definition since I am not an expert in any. 这可能不是准确的xml / xslt定义,因为我不是任何专家。

a /root_node/settings/ setting1 is a "common node" with b /root_node/settings/ setting1 since the 2 nodes are reached using the same path. a / root_node / settings / setting1b / root_node / settings / setting1的“公共节点”,因为使用相同路径访问了2个节点。 The same for setting2 and setting3. 设置2和设置3相同。

The 2 "non-common nodes" are a /root_node/settings/ setting4 which is found only in a.xml (it should not come in the output) and b /root_node/settings/ setting5 which is found only in b.xml (it should come into the output). 在2“非公共节点”是一个 / root_node /设置/ setting4其仅在A.XML实测值(它应该在输出不来)和b / root_node /设置/ setting5其仅在B.XML实测值(它应该进入输出)。

By "generic solution" I don't mean something that will work whatever format the input XMLs will have. 我所说的“通用解决方案”并不是说某种东西可以用输入XML的任何格式工作。 What I mean by that is that the xslt should not contain hard-code xpaths while you might add restrictions like "this will work only if the nodes in a.xml are unique" or whatever other restriction you might think it will be suitable. 我的意思是,xslt不应包含硬编码的xpath,而您可能会添加诸如“仅当a.xml中的节点是唯一的时这才有效”之类的限制,或者您可能认为合适的任何其他限制。

the basic technique to operate on multiple files is through the document() function. 对多个文件进行操作的基本技术是通过document()函数。 The document function looks like this: 文档功能如下所示:

<xsl:variable name="var1" select="document('http://example.com/file1.xml', /)"/>
<xsl:variable name="var2" select="document('http://example.com/file2.xml', /)"/>

Once you have the two documents, you can use their contents like they are available in the same document. 拥有两个文档后,就可以像在同一文档中一样使用它们的内容。

The following XSLT 1.0 program does what you want. 以下XSLT 1.0程序可以满足您的需求。

Apply it to b.xml and pass in the path to a.xml as a parameter. 将其应用于b.xml ,并将路径作为参数传递给a.xml

Here is how it works. 下面是它的工作原理。

  1. It traverses B , as that contains the new nodes that you want to keep as well as the common elements between A and B . 它遍历B ,因为它包含要保留的新节点以及AB之间的公共元素
    1. I define "common element" as any element that has the same simple path . 我将“公共元素”定义为具有相同简单路径的任何元素。
    2. I define "simple path" as the slash-delimited list of names of ancestor elements and the element itself, ie the ancestor-or-self axis. 我将“简单路径”定义为祖先元素名称和元素本身(即ancestor-or-self轴)的斜杠分隔列表。
      So in your sample B , <setting31> would have a simple path of root_node/settings/setting3/setting31/ . 因此,在示例B<setting31>将具有简单root_node/settings/setting3/setting31/ 路径
    3. Note that this path is ambiguous. 请注意,此路径是不明确的。 The implication is that you cannot have any two elements with the same name that share the same parent in your input. 这意味着您不能在输入中拥有任何两个具有相同名称的元素并共享同一父元素。 Based on your samples I presume that will not be the case. 根据您的样本,我认为情况并非如此。
  2. For every leaf text node (any text node in an element with no further child elements) 对于每个叶文本节点 (元素中没有其他子元素的任何文本节点)
    1. The simple path is calculated with a template called calculatePath . 简单路径是通过名为calculatePath的模板calculatePath
    2. The recursive template nodeValueByPath is called that tries to retrieve the text value of the corresponding simple path from the other document. 调用递归模板nodeValueByPath ,尝试从另一个文档中检索相应简单路径的文本值。
    3. If a corresponding text node is found, its value is used. 如果找到相应的文本节点,则使用其值。 This satisfies your first bullet point. 这满足您的第一个要点。
    4. If no corresponding node is found, it uses the value at hand, ie the value from B . 如果没有找到对应的节点,它将使用当前值,即B的值。 This satisfies your second bullet point. 这满足您的第二个要点。

As a result, the new document matches B 's structure and contains: 结果,新文档匹配B的结构并包含:

  • all text node values from B that have no corresponding node in A . B中所有在A没有对应节点的文本节点值。
  • text node values from A when a corresponding node in B exists. B的对应节点存在时,来自A文本节点值。

Here's the XSLT: 这是XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" />

  <xsl:param name="aXmlPath" select="''" />
  <xsl:param name="aDoc"     select="document($aXmlPath)" />

  <xsl:template match="@* | node()">
    <xsl:copy>
       <xsl:apply-templates select="@* | node()" />
    </xsl:copy>
  </xsl:template>

  <!-- text nodes will be checked against doc A -->
  <xsl:template match="*[not(*)]/text()">
    <xsl:variable name="path">
      <xsl:call-template name="calculatePath" />
    </xsl:variable>

    <xsl:variable name="valueFromA">
      <xsl:call-template name="nodeValueByPath">
        <xsl:with-param name="path"    select="$path" />
        <xsl:with-param name="context" select="$aDoc" />
      </xsl:call-template>
    </xsl:variable>

    <xsl:choose>
      <!-- either there is something at that path in doc A -->
      <xsl:when test="starts-with($valueFromA, 'found:')">
        <!-- remove prefix added in nodeValueByPath, see there --> 
        <xsl:value-of select="substring-after($valueFromA, 'found:')" />
      </xsl:when>
      <!-- or we take the value from doc B -->
      <xsl:otherwise>
        <xsl:value-of select="." />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <!-- this calcluates a simpe path for a node -->
  <xsl:template name="calculatePath">
    <xsl:for-each select="..">
      <xsl:call-template name="calculatePath" />
    </xsl:for-each>
    <xsl:if test="self::*">
      <xsl:value-of select="concat(name(), '/')" />
    </xsl:if>
  </xsl:template>

  <!-- this retrieves a node value by its simple path -->
  <xsl:template name="nodeValueByPath">
    <xsl:param name="path"    select="''" />
    <xsl:param name="context" select="''" />

    <xsl:if test="contains($path, '/') and count($context)">
      <xsl:variable name="elemName" select="substring-before($path, '/')" />
      <xsl:variable name="nextPath" select="substring-after($path, '/')" />
      <xsl:variable name="currContext" select="$context/*[name() = $elemName][1]" />

      <xsl:if test="$currContext">
        <xsl:choose>
          <xsl:when test="contains($nextPath, '/')">
            <xsl:call-template name="nodeValueByPath">
              <xsl:with-param name="path"    select="$nextPath" />
              <xsl:with-param name="context" select="$currContext" />
            </xsl:call-template>
          </xsl:when>
          <xsl:when test="not($currContext/*)">
            <!-- always add a prefix so we can detect 
                 the case "exists in A, but is empty" -->
            <xsl:value-of select="concat('found:', $currContext/text())" />
          </xsl:when>
        </xsl:choose>
      </xsl:if>
    </xsl:if>    
  </xsl:template>
</xsl:stylesheet>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM