I have a node containing text and child elements. I would like to separate each sentence into a different element. The problem is that the element can have child nodes and even grand-children with the separator (full stop) in it.
So this XML:
<text>
<p>Lorem ipsum dolor sit <a>amet</a>, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex ea commodo consequat. Duis <b>aute</b> irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat <d>non <c>proident</c>, sunt in. culpa</d> qui officia deserunt mollit anim id est laborum.</p>
</text>
Should be separated on the full stop in the current() element (but not within the child, eg see the period in element <d/>
).
The desired output:
<text>
<sentence>Lorem ipsum dolor sit <a>amet</a>, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.</sentence>
<sentence>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat.</sentence>
<sentence>Duis <b>aute</b> irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur.</sentence>
<sentence>Excepteur sint occaecat cupidatat <d>non <c>proident</c>, sunt in.
culpa</d> qui officia deserunt mollit anim id est laborum.</sentence>
</text>
I've seen this answer: XSLT split elements by splitter-element with <xsl:for-each-group select="node()" group-adjacent="boolean(self::separator)"> or group-starting-with="separator"
but so far it has not helped me in selecting the text/nodes leading up the period.
I can use XSLT 2/3.
By using one mode to convert the dot in text children of p
into an element and then a grouping step with group-ending-with
this should be easy:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="p">
<xsl:variable name="marked-text" as="node()*">
<xsl:apply-templates mode="insert-marker"/>
</xsl:variable>
<xsl:for-each-group select="$marked-text" group-ending-with="full-stop">
<sentence>
<xsl:apply-templates select="current-group()"/>
</sentence>
</xsl:for-each-group>
</xsl:template>
<xsl:mode name="insert-marker" on-no-match="shallow-copy"/>
<xsl:template mode="insert-marker" match="p/text()">
<xsl:analyze-string select="." regex="\.">
<xsl:matching-substring>
<full-stop>.</full-stop>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="full-stop">
<xsl:text>.</xsl:text>
</xsl:template>
</xsl:stylesheet>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.