简体   繁体   中英

Merge xml contents of elements with same parent attribute values using xslt 2.0

I have two xml files

file1.xml

<?xml version="1.0" encoding="UTF-8"?>
<tv>
...
  <programme start="20200814040000 +0000" stop="20200814050000 +0000" channel="A">
    <title>A</title>
    <sub-title>C</sub-title>
    <desc>F</desc>
  </programme>
...
  <programme start="20200814090000 +0000" stop="20200814093000 +0000" channel="A">
    <title>B</title>
    <sub-title>D</sub-title>
    <desc>E</desc>
  </programme>
...
</tv>

file2.xml

<?xml version="1.0" encoding="UTF-8"?>
<tv>
...
  <programme start="20200814040000 +0000" stop="20200814050000 +0000" channel="A">
    <title>G</title>
    <sub-title>C</sub-title>
    <desc>H</desc>
    <episode-num system="onscreen">S9 E13</episode-num>
  </programme>
...
  <programme start="20200814090000 +0000" stop="20200814093000 +0000" channel="A">
    <title>K</title>
    <sub-title>L</sub-title>
    <desc>M</desc>
    <episode-num system="onscreen">S3 E2</episode-num>
  </programme>  
...
</tv>

I would like an xslt 2 template to get a new file

file3.xml

<?xml version="1.0" encoding="UTF-8"?>
<tv>
...
  <programme start="20200814040000 +0000" stop="20200814050000 +0000" channel="A">
    <title>A (G)</title>
    <sub-title>C</sub-title>
    <desc>F (H)</desc>
    <episode-num system="onscreen">S9 E13</episode-num>
  </programme>
...
<programme start="20200814090000 +0000" stop="20200814093000 +0000" channel="A">
    <title>B (K)</title>
    <sub-title>D (L)</sub-title>
    <desc>E (M)</desc>
    <episode-num system="onscreen">S3 E2</episode-num>
  </programme>
...
</tv>

I experimented a little bit, but I couldn't get the expected output. Any help would be appreciated.

Edited for precision

when programme attributes are the same from each file:

  1. merge the child elements that are present in both files to one element on the new file AND if the text contents of the node are NOT the same, place the 2nd file's contents in parentheses
  2. if a child element is not present in both files, then include it in the new file

In XSLT 3 perhaps the function for-each-pair can help:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="#all"
  expand-text="yes">
  
  <xsl:param name="doc2">
<tv>
  <channel id="Discovery">
    <display-name lang="el">Discovery</display-name>
  </channel>
  <programme start="20200814040000 +0000" stop="20200814050000 +0000" channel="Discovery">
    <title lang="el">Wheeler Dealers</title>
    <sub-title lang="el">BMW Isetta</sub-title>
    <desc lang="el">Mike tracks down an Isetta Bubble. </desc>
    <episode-num system="onscreen">S9 E13</episode-num>
  </programme>
</tv>
  </xsl:param>
  
  <xsl:output indent="yes"/>
  
  <xsl:function name="mf:merge-pair">
    <xsl:param name="programme1"/>
    <xsl:param name="programme2"/>
    <xsl:if test="deep-equal($programme1/@*, $programme2/@*)">
      <xsl:copy select="$programme1">
        <xsl:apply-templates select="@*"/>
        <xsl:for-each-group select="$programme1/*, $programme2/*" composite="yes" group-by="node-name(), @*">
          <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:value-of select="head(current-group()), tail(current-group()) ! ('(' || . || ')')"/>
          </xsl:copy>
        </xsl:for-each-group>
      </xsl:copy>
    </xsl:if>
  </xsl:function>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="tv">
    <xsl:copy>
      <xsl:apply-templates select="@*, channel"/>
      <xsl:sequence
         select="for-each-pair(programme, $doc2/tv/programme, mf:merge-pair#2)"/>      
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

In above example I have inlined the second document for completeness and self-containedness but of course in a real life application you can use eg <xsl:param name="doc2" select="doc('input2.xml')"/> .

XSLT 3 with for-each-pair is available with Saxon 10 all editions or the commercial 9.8 or 9.9 editions of Saxon or in Saxon-JS 2 for Node.js or in the browser.

As for your comment, it seems you have edited the samples and now it appears that duplicated contents like BMW Isetta (BMW Isetta) is supposed to be eliminated so you could change

 <xsl:value-of select="head(current-group()), tail(current-group()) ! ('(' || . || ')')"/>

to

<xsl:value-of select="let $values := distinct-values(current-group()) return (head(
        $values), tail($values)! ('(' || . || ')'))"/>

Output for me with your edited samples and Saxon HE 10.1 is

<tv>
   <programme start="20200814040000 +0000"
              stop="20200814050000 +0000"
              channel="A">
      <title>A (G)</title>
      <sub-title>C</sub-title>
      <desc>F (H)</desc>
      <episode-num system="onscreen">S9 E13</episode-num>
   </programme>
   <programme start="20200814090000 +0000"
              stop="20200814093000 +0000"
              channel="A">
      <title>B (K)</title>
      <sub-title>D (L)</sub-title>
      <desc>E (M)</desc>
      <episode-num system="onscreen">S3 E2</episode-num>
   </programme>
</tv>

Complete stylesheet is

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="3.0"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:mf="http://example.com/mf"
    exclude-result-prefixes="#all"
    expand-text="yes">
    
    <xsl:param name="doc2" select="doc('file2.xml')"/>
    
    <xsl:output indent="yes"/>
    
    <xsl:function name="mf:merge-pair">
        <xsl:param name="programme1"/>
        <xsl:param name="programme2"/>
        <xsl:if test="deep-equal($programme1/@*, $programme2/@*)">
            <xsl:copy select="$programme1">
                <xsl:apply-templates select="@*"/>
                <xsl:for-each-group select="$programme1/*, $programme2/*" composite="yes" group-by="node-name(), @*">
                    <xsl:copy>
                        <xsl:apply-templates select="@*"/>
                        <xsl:value-of select="let $values := distinct-values(current-group()) return (head(
                            $values), tail($values)! ('(' || . || ')'))"/>
                    </xsl:copy>
                </xsl:for-each-group>
            </xsl:copy>
        </xsl:if>
    </xsl:function>
    
    <xsl:mode on-no-match="shallow-copy"/>
    
    <xsl:template match="tv">
        <xsl:copy>
            <xsl:apply-templates select="@*, channel"/>
            <xsl:sequence
                select="for-each-pair(programme, $doc2/tv/programme, mf:merge-pair#2)"/>      
        </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>

I would do something like:

<xsl:variable name="file1" select="doc('file1.xml')"/>
<xsl:variable name="file2" select="doc('file2.xml')"/>

<xsl:template name="xsl:initial-template">
  <tv>
    <xsl:copy-of select="$file1/tv/channel"/>
    <xsl:for-each-group select="($file1|file2)/tv/programme"
        group-by="@stop, @start, @channel" composite="yes">
      <xsl:for-each-group select="*" group-by="node-name()">
        <xsl:element name="{name()}">
          <xsl:copy-of select="current-group()/@*"/> 
          <xsl:value-of select="current-group()[1]"/>
          <xsl:for-each select="current-group()[2]">
            <xsl:value-of select="'(', ., ')'"/>
          </xsl:for-each>
        </xsl:element>
      </xsl:for-each-group>
    </xsl:for-each-group>
  </tv>     
</xsl:template>

Not tested.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM