简体   繁体   中英

XPath expression to select unique nodes

I'm working on a project where I have to transform some XML input to some XML output, and for this I'm using XSLT version 1.

The input XML files I'm working on are huge like 10k+ lines, but I've spent the better part of an hour boiling it down to the following code snippet, which caputures the problem.

This is the input XML

<QueryInput >
  <Subject>
    <Content>
      <MunicipalityCode>0217</MunicipalityCode>
    </Content>
  </Subject>
  <QueryResultStep>
    <Multistep>
      <IterationResponse>
        <QueryResult>
          <Kommune>0217</Kommune>
        </QueryResult>
      </IterationResponse>
      <IterationResponse>
        <QueryResult>
          <Kommune>0217</Kommune>
        </QueryResult>
      </IterationResponse>
      <IterationResponse>
        <QueryResult>
          <Kommune>0223</Kommune>
        </QueryResult>
      </IterationResponse>
      <IterationResponse>
        <QueryResult>
          <Kommune>0223</Kommune>
        </QueryResult>
      </IterationResponse>
    </Multistep>
  </QueryResultStep>
</QueryInput>

The output XML should contain each "Kommune" once, removing duplicates. For this I made the following XSLT code.

<?xml version="1.0" encoding="utf-8"?>
<xsl:transform version="1.0" xmlns:msxsl="urn:schemas-microsoft-com:xslt"
               xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
               xmlns:xsd="http://www.w3.org/2001/XMLSchema"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               exclude-result-prefixes="xsl xsi xsd">

  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="/">

    <QueryResult>
      <xsl:variable name="something">
        <KommuneCollection>
          <xsl:for-each select="QueryInput/QueryResultStep/Multistep/IterationResponse/QueryResult/Kommune[not(.=preceding::*)]">
            <NewKommune>
              <xsl:value-of select="."/>
            </NewKommune>
          </xsl:for-each>
        </KommuneCollection>
      </xsl:variable>
      <xsl:copy-of select="$something"/>
    </QueryResult>
  </xsl:template>
</xsl:transform>

Which produces the following (almost correct) output:

<KommuneCollection>
    <NewKommune>0223</NewKommune>
</KommuneCollection>

But should produce

<KommuneCollection>
    <NewKommune>0217</NewKommune>
    <NewKommune>0223</NewKommune>
</KommuneCollection>

If I remove the <MunicipalityCode>0217</MunicipalityCode> in the input XML, all of a sudden it works - but i really dont understand why. Not why it's happening, and I dont know how to address this issue either. Any help is greatly appreciated!

EDIT: The issue can easily be replicated by copying the input XML into Notepad++, installing the XPathenizer tool, show the window and enter this XPath expression QueryInput/QueryResultStep/Multistep/IterationResponse/QueryResult/Kommune[not(.=preceding::*)] , and executing the expression. The results can then be seen on teh right side. I suspect the problem to be with the XPath expression used in the for-each tag in the XSLT.

As michael.hor257k says, Muenchian's grouping will be helpful for you(dealing with large files). But, following will be the correct version of your current try:

<xsl:transform version="1.0" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" exclude-result-prefixes="xsl xsi xsd">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
    <QueryResult>
        <KommuneCollection>
            <xsl:for-each select="QueryInput/QueryResultStep/Multistep/IterationResponse/QueryResult/Kommune[not(. = preceding::QueryResult/Kommune )]">
                <NewKommune>
                    <xsl:value-of select="."/>
                </NewKommune>
            </xsl:for-each>
        </KommuneCollection>
    </QueryResult>
</xsl:template>
</xsl:transform>

Note : This way is less efficient. You will feel the difference when you use Muenchian's grouping.

Your predicate would have worked, but was failing to include "217" because the /QueryInput/Subject/Content/MunicipalityCode happened to have the value "217".

If you adjust your predicate filter to match for preceding Kommune elements instead of any preceding element, then it will produce the desired results:

[not(.=preceding::Kommune)]

However, it isn't very efficient. If your file is huge, then using a xsl:key() and the meunchian method will be more performant.

<?xml version="1.0" encoding="utf-8"?>
<xsl:transform version="1.0" 
    xmlns:msxsl="urn:schemas-microsoft-com:xslt"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    exclude-result-prefixes="xsl xsi xsd">

    <xsl:output method="xml" indent="yes"/>
    <xsl:key name="Kommune" match="Kommune" use="."/>
    <xsl:template match="/">     
        <QueryResult>
            <xsl:variable name="something">
                <KommuneCollection>
                    <xsl:for-each 
                          select="QueryInput/QueryResultStep/Multistep/
                                    IterationResponse/QueryResult/
                                    Kommune[generate-id(.) = 
                                            generate-id(key('Kommune',.)[1])]">
                        <NewKommune>
                            <xsl:value-of select="."/>
                        </NewKommune>
                    </xsl:for-each>
                </KommuneCollection>
            </xsl:variable>
            <xsl:copy-of select="$something"/>
        </QueryResult>
    </xsl:template>
</xsl:transform>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM