简体   繁体   中英

Create XML with only distinct paths using XSLT

I want to create an XML that consists only of node names which have a distinct path found in the source document. The value is not important here it can be empty or a dummy value. In other words the resulting document should contain only the (node-)essence of the source document.

I have found an answer (link: How to list complete XML document using XSLT ) to a quite similar problem that points into the right direction, but is not exactly what I am looking for.

Using a modified example of the mentioned post:

Source Document:

<?xml version="1.0" encoding="UTF-8"?>
<MediaCatalog name="AccessoriesCatalog">
    <Category Definition="AccessoriesCategory"
    name="1532" id="1532">
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16115" id="16115">
        <ParentCategory>1532</ParentCategory>
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16116" id="16116">
        <ParentCategory>16115</ParentCategory>
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16126" id="16126">
        <ParentCategory>16115</ParentCategory>
            <genre>
                <id>17</id>
                <name>Fairy Tales</name>
           </genre>
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16131" id="16131">
        <ParentCategory>1532</ParentCategory>
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16132" id="16132">
        <ParentCategory>16131</ParentCategory>
            <language>
                <id>1</id>
                <name>English</name>
                <shortName>EN</shortName>
            </language>
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16136" id="16136">
        <ParentCategory>16131</ParentCategory>
            <genre>
                <id>18</id>
                <name>Thriller</name>
           </genre>
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16139" id="16139">
        <ParentCategory>16131</ParentCategory>
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16144" id="16144">
        <ParentCategory>16131</ParentCategory>
        <subCategory>
            <label>
                <id>444</id>
                <name>label444</name>
            </label>
        </subCategory>
    </Category>
    <Category Definition="AccessoriesCategory"
    name="16195" id="16195">
        <ParentCategory>16131</ParentCategory>
    </Category>
</MediaCatalog>

The resulting document should look like this:

<MediaCatalog>
    <Category>
        <ParentCategory>
        </ParentCategory>
        <genre>
            <id></id>
            <name></name>
        </genre>
        <language>
            <id></id>
            <name></name>
            <shortName></shortName>
        </language>
        <subCategory>
            <label>
                <id></id>
                <name></name>
            </label>
        </subCategory>
    </Category>
</MediaCatalog>

Based on the answers I've found for similar problems I've come up with the following transformation to achieve that:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" encoding="UTF-8"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kElemByName" match="*" use="local-name()"/>

 <xsl:template match="
  *[generate-id()
   =
    generate-id(key('kElemByName', local-name())[1])
   ]">
  <xsl:value-of select="concat('&lt;', local-name(), '&gt;', '&#xA;')" disable-output-escaping="yes" />
  <xsl:apply-templates select="*"/>
  <xsl:value-of select="concat('&lt;/', local-name(), '&gt;', '&#xA;')" disable-output-escaping="yes" />
 </xsl:template>
 <xsl:template match="text()">
 </xsl:template>
</xsl:stylesheet>

However this does not give me the correct answer as the keys for the Muenchian Grouping Method are based on just the node names using the function local-name().

So applying this to the source xml above what I get instead of the correct output is:

<?xml version="1.0" encoding="UTF-8"?>
<MediaCatalog>
    <Category>
    </Category>
    <ParentCategory>
    </ParentCategory>
    <genre>
        <id>
        </id>
        <name>
        </name>
    </genre>
    <language>
        <shortName>
        </shortName>
    </language>
    <subCategory>
        <label>
        </label>
    </subCategory>
</MediaCatalog>

In order to create the correct output it is necessary to use the complete node path as a key instead of using just the node name. The question is how this is possible in XSLT, because as far as I know there is no built in function such as getFullPath() in order to get the full path of the current node.

The following XSLT produces the desired output (except for the fact that is not nicely indented):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
  <xsl:strip-space elements="*"/>

  <!-- Template for recursive handling of the child nodes. 
       Note that these nodes originate from different parents belonging to the same group! 
       This is important since we have to scan ALL sub trees of the a group and not only the first one!
  -->
  <xsl:template name="descend">
    <xsl:param name="nodes"/>

    <xsl:for-each-group select="$nodes" group-by="local-name()">

      <xsl:value-of select="concat('&lt;', current-grouping-key(), '&gt;', '&#xA;')" disable-output-escaping="yes" />

      <xsl:call-template name="descend">
        <!-- Call recursively. -->
        <xsl:with-param name="nodes" select="current-group()/*"/>
      </xsl:call-template>

      <xsl:value-of select="concat('&lt;/', current-grouping-key(), '&gt;', '&#xA;')" disable-output-escaping="yes" />

    </xsl:for-each-group>
  </xsl:template>

  <!-- Start the recursion with the children of the root node. -->
  <xsl:template match="/">
    <xsl:call-template name="descend">
      <xsl:with-param name="nodes" select="*"/>
    </xsl:call-template>    
  </xsl:template>

</xsl:stylesheet>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM