简体   繁体   中英

XSLT merge nodes

SO I have a messy xhtml file, which I like to transform into xml. It is a lexicon with certain with lots of 'p' tags and I wanted to sort them out. Here is the xhtml file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta content="2018-06-29T10:12:48Z" name="dcterms.created" />
        <meta content="2018-06-29T10:12:48Z" name="dcterms.modified" />
    </head>
    <body>
        <p><b>Aesthetik</b></p>
        <p>text about aesthetics.</p>
        <p><b>Expl: </b>explanation about aesthetics</p>
        <p><b>BegrG: </b>origin of the term</p>
        <p>more origin of the term</p>
        <p><b>Allegorese</b></p>
        <p>text about Allegorese</p>
        <p><b>Expl: </b>explanation about Allegorese</p>
        <p><b>BegrG: </b>origin of Allegorese</p>
    </body>
</html>

XSLT file looks like this (there are several additional lines for other tags, which weren't included here):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml">

<xsl:template match="head"/>

<xsl:template match="text()">
    <xsl:value-of select="normalize-space()"/>
</xsl:template>

<xsl:template match="body">
    <lexica>
        <xsl:apply-templates/>      <!-- create root node lexica -->
    </lexica>
</xsl:template>

<xsl:template match="p">
    <p>
        <xsl:apply-templates/> <!-- copy same tags for better visuality -->
    </p>
</xsl:template>

<xsl:template match="p[b[contains(., 'BegrG')]]">
    <BegrG>
        <xsl:apply-templates/>  <!-- create specific nodes with origin explanation of the word -->
    </BegrG>
</xsl:template>

<xsl:template match="p[b[contains(., 'Expl')]]">
    <Expl>
        <xsl:apply-templates/>  <!-- node with explanation of the word --> 
    </Expl>
</xsl:template>


<xsl:template
    match="
    p[b[not(self::*[contains(., 'Expl')]or
    self::*[contains(., 'BegrG')])]]">  <!-- any other b nodes which are left are lexical items -->
    <Artikel>
        <xsl:apply-templates/>
    </Artikel>
</xsl:template>

At the end my XML file looks like this:

    <lexica>
    <Artikel>Aesthetik</Artikel>
    <p>text about aesthetics.</p>
    <Expl>Expl:explanation about aesthetics</Expl>
    <BegrG>BegrG:origin of the term</BegrG>
    <p>more origin of the term</p>
    <Artikel>Allegorese</Artikel>
    <p>text about Allegorese</p>
    <Expl>Expl:explanation about Allegorese</Expl>
    <BegrG>BegrG:origin of Allegorese</BegrG>
</lexica>

Which looks better but still wont work, since it's not structured enough. For example the terms aren't grouped up and some 'p' tags should be merged to their previous sibling. It should look like this:

<lexica>
 <item>
  <Artikel>Aesthetik</Artikel>
  <short>text about aesthetics.</short>
  <Expl>Expl:explanation about aesthetics</Expl>
  <BegrG>BegrG:origin of the term. more origin of the term.</BegrG>
 </item>

 <item>
  <Artikel>Allegorese</Artikel>
  <short>text about Allegorese</short>
  <Expl>Expl:explanation about Allegorese</Expl>
  <BegrG>BegrG:origin of Allegorese</BegrG>
 </item>
</lexica>

Am I approaching this wrong or how am I supposed to group up 'p' tags to their sibling which have b- child? And how do I seperate the term items from each other and make it recognize when the close tag should occur?

(Sorry for my bad english)

Thanks in advance!

XSLT 2/3 has for-each-group group-starting-with ( https://www.w3.org/TR/xslt20/#xsl-for-each-group ) so you can implement the creation of item elements with

  <xsl:template match="body">
      <lexica>
          <xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
              <item>
                  <xsl:apply-templates select="current-group()"/>
              </item>
          </xsl:for-each-group>
      </lexica>
  </xsl:template>

I think, example is at https://xsltfiddle.liberty-development.net/bFDb2CG .

I am not sure so far what determines the merging of some p elements into the BegrG result, perhaps a nested grouping with

  <xsl:template match="body">
      <lexica>
          <xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
              <item>
                  <xsl:for-each-group select="current-group()" group-starting-with="p[b[starts-with(., 'BegrG:')]]">
                      <xsl:choose>
                          <xsl:when test="self::p[b[starts-with(., 'BegrG:')]]">
                              <BegrG>
                                  <xsl:apply-templates select="current-group()/node()"/>
                              </BegrG>
                          </xsl:when>
                          <xsl:otherwise>
                              <xsl:apply-templates select="current-group()"/>
                          </xsl:otherwise>
                      </xsl:choose>
                  </xsl:for-each-group>
              </item>
          </xsl:for-each-group>
      </lexica>
  </xsl:template>

implements that: https://xsltfiddle.liberty-development.net/bFDb2CG/1

As for the problem raised in the comment, you could add another match to the group-starting-with :

  <xsl:template match="body">
      <lexica>
          <xsl:for-each-group select="*" group-starting-with="p[b[not(matches(., '^(Expl|BegrG):'))]]">
              <item>
                  <xsl:for-each-group select="current-group()" group-starting-with="p[b[starts-with(., 'Expl:')]] | p[b[starts-with(., 'BegrG:')]]">
                      <xsl:choose>
                        <xsl:when test="self::p[b[starts-with(., 'Expl:')]]">
                              <Expl>
                                  <xsl:apply-templates select="current-group()/node()"/>
                              </Expl>
                          </xsl:when>
                          <xsl:when test="self::p[b[starts-with(., 'BegrG:')]]">
                              <BegrG>
                                  <xsl:apply-templates select="current-group()/node()"/>
                              </BegrG>
                          </xsl:when>
                          <xsl:otherwise>
                              <xsl:apply-templates select="current-group()"/>
                          </xsl:otherwise>
                      </xsl:choose>
                  </xsl:for-each-group>
              </item>
          </xsl:for-each-group>
      </lexica>
  </xsl:template>

https://xsltfiddle.liberty-development.net/bFDb2CG/2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM