简体   繁体   中英

XSLT split element containing text and child nodes

I have a node containing text and child elements. I would like to separate each sentence into a different element. The problem is that the element can have child nodes and even grand-children with the separator (full stop) in it.

So this XML:

    <text>
    <p>Lorem ipsum dolor sit <a>amet</a>, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
        labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco
        laboris nisi ut aliquip ex ea commodo consequat. Duis <b>aute</b> irure dolor in reprehenderit in
        voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
        cupidatat <d>non <c>proident</c>, sunt in. culpa</d> qui officia deserunt mollit anim id est laborum.</p>
</text>

Should be separated on the full stop in the current() element (but not within the child, eg see the period in element <d/> ).

The desired output:

 <text>
        <sentence>Lorem ipsum dolor sit <a>amet</a>, consectetur adipiscing elit, sed do eiusmod tempor
            incididunt ut labore et dolore magna aliqua.</sentence>
        <sentence>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
            ea commodo consequat.</sentence>
        <sentence>Duis <b>aute</b> irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
            fugiat nulla pariatur.</sentence>
<sentence>Excepteur sint occaecat cupidatat <d>non <c>proident</c>, sunt in.
                culpa</d> qui officia deserunt mollit anim id est laborum.</sentence>
    </text>

I've seen this answer: XSLT split elements by splitter-element with <xsl:for-each-group select="node()" group-adjacent="boolean(self::separator)"> or group-starting-with="separator" but so far it has not helped me in selecting the text/nodes leading up the period.

I can use XSLT 2/3.

By using one mode to convert the dot in text children of p into an element and then a grouping step with group-ending-with this should be easy:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="p">
     <xsl:variable name="marked-text" as="node()*">
         <xsl:apply-templates mode="insert-marker"/>
     </xsl:variable>
     <xsl:for-each-group select="$marked-text" group-ending-with="full-stop">
         <sentence>
             <xsl:apply-templates select="current-group()"/>
         </sentence>
     </xsl:for-each-group>
  </xsl:template>
  
  <xsl:mode name="insert-marker" on-no-match="shallow-copy"/>
  
  <xsl:template mode="insert-marker" match="p/text()">
      <xsl:analyze-string select="." regex="\.">
          <xsl:matching-substring>
              <full-stop>.</full-stop>
          </xsl:matching-substring>
          <xsl:non-matching-substring>
              <xsl:value-of select="."/>
          </xsl:non-matching-substring>
      </xsl:analyze-string>
  </xsl:template>
  
  <xsl:template match="full-stop">
      <xsl:text>.</xsl:text>
  </xsl:template>
  
</xsl:stylesheet>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM