简体   繁体   English

使用XslCompiledTransform进行空白剥离

[英]Whitespace stripping with XslCompiledTransform

I'm trying to migrate a large app from XslTransform to compiled xsl files and XslCompiledTransform . 我正在尝试将大型应用程序从XslTransform迁移到已编译的xsl文件和XslCompiledTransform

The app uses the Xsl to create HTML files, and the transformation data ( Xml ) was passed to the Xsl with a XmlDataDocument , returned from the database. 该应用程序使用Xsl创建HTML文件,并使用从数据库返回的XmlDataDocument将转换数据( Xml )传递给Xsl

I've change all that so now I do (at least temporarily): 我现在改变所有这一切(至少暂时):

C# C#

 public string ProcessCompiledXsl(XmlDataDocument xml)
 {
       StringBuilder stringControl = new StringBuilder();
       XslCompiledTransform xslTran = new XslCompiledTransform();

       xslTran.Load(
           System.Reflection.Assembly.Load("CompiledXsl").GetType(dllName)
       );

       xslTran.Transform(xml, this.Arguments, XmlWriter.Create(stringControl, othersettings), null);

       return stringControl.ToString();
 }

XSL (just an example) XSL(仅举例)

...
  <xsl:output method="html" indent="yes"/>
  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>

Problem 问题

That works, but the xsl is stripping the whitespaces between the tags outputting: 这是有效的,但xsl正在剥离输出的标签之间的空格:

<a href="#">
   some text
</a><a href="#">
   some text
</a><a href="#">
   some text
</a><a...etc

I've tried: 我试过了:

  • Using xml:space="preserve" but I couldn't get it to work 使用xml:space="preserve"但我无法让它工作
  • Overriding the OutputSettings , but I didn't get any good results (maybe I missed something) 覆盖OutputSettings ,但我没有得到任何好结果(也许我错过了一些东西)
  • Using an xsl:output method="xml" , and that works, but creates self closing tags and a lot of other problems 使用xsl:output method="xml" ,这是有效的,但会创建自闭标签和许多其他问题

So I don't know what to do. 所以我不知道该怎么办。 Maybe I'm not doing something right.Any help it's really appreciated. 也许我没做正确的事。任何帮助都非常感激。

Thanks! 谢谢!

EDIT 编辑

Just for future references, if you want to tackle this problem leaving every XSL intact, one could try this C# class I wrote, named CustomHtmlWriter . 只是为了将来的引用,如果你想解决这个问题,让每个XSL保持不变,可以尝试我编写的这个名为CustomHtmlWriter C#类

Basically what I did is extend from XmlTextWriter and modify the methods that write the start and the end of every tag. 基本上我所做的是从XmlTextWriter扩展并修改写入每个标记的startend的方法。

In this particular case, you would use it like this: 在这种特殊情况下,您可以像这样使用它:

    StringBuilder sb = new StringBuilder();
    CustomHtmlWriter writer = new CustomHtmlWriter(sb);

    xslTran.Transform(nodeReader, this.Arguments, writer);

    return sb.ToString();

Hope it helps someone. 希望它可以帮助某人。

I. Solution 1 : I.解决方案1

Let me first analyze the problem here : 我先来分析一下这个问题

Given this source XML document (invented, as you haven't provided any): 鉴于此源XML文档(发明,因为您没有提供任何):

<Object>
 <Table>

 </Table>

 <Table>

 </Table>

 <Table>

 </Table>

 <Table>

 </Table>
</Object>

This transformation : 这种转变

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html" indent="yes"/>

  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>
<!--
 <xsl:template match="Table">
   <a href="#">
    Table here
   </a>
 </xsl:template>
 -->
</xsl:stylesheet>

exactly reproduces the problem -- the result is: 完全重现问题 - 结果是:

<a href="#">
                     some text
              </a><a href="#">
                     some text
              </a><a href="#">
                     some text
              </a><a href="#">
                     some text
              </a>

Now, just uncomment the commented template and comment out the first template: 现在,只需取消注释注释模板并注释掉第一个模板:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html" indent="yes"/>
<!--
  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>
 -->
 <xsl:template match="Table">
   <a href="#">
    Table here
   </a>
 </xsl:template>
</xsl:stylesheet>

The result has the wanted indentation : 结果有所需的缩进

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

And this was solution 1 这是解决方案1


II. II。 Solution 2 : 解决方案2

This solution may reduce to minimum the required modifications to your existing XSLT code: 此解决方案可以将对现有XSLT代码所需的修改降至最低:

This is a two-pass transformation : 这是一个两遍转换

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="ext">
 <xsl:output method="html"/>

  <xsl:template match="/">
    <xsl:variable name="vrtfPass1">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
    </xsl:variable>

    <xsl:apply-templates select=
        "ext:node-set($vrtfPass1)" mode="pass2"/>
  </xsl:template>

 <xsl:template match="node()|@*" mode="pass2">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="pass2"/>
  </xsl:copy>
 </xsl:template>

  <xsl:template mode="pass2" match="*[preceding-sibling::node()[1][self::*]]">
   <xsl:text>&#xA;</xsl:text>
   <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>

The idea is that we don't even touch the existing code, but capture its output and using a few lines of additional code only, we format the output to have the wanted, final appearance. 我们的想法是,我们甚至不触及现有代码,只捕获其输出并仅使用几行附加代码,我们将输出格式化为具有所需的最终外观。

When this transformation is applied on the same XML document, the same, wanted result is produced: 当此转换应用于同一XML文档时,会生成相同的想要结果:

<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>

Finally, here is a demonstration how this minor change can be introduced, without touching at all any existing XSLT code : 最后,这里演示了如何引入这个小改动,而不涉及任何现有的XSLT代码

Let's have this existing code in c:\\temp\\delete\\existing.xsl : 我们在c:\\temp\\delete\\existing.xsl有这个现有的代码:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html"/>

  <xsl:template match="/">
    <xsl:for-each select="//Object/Table">
      <a href="#">
        some text
      </a>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

If we run this we get the problematic output . 如果我们运行这个,我们得到有问题的输出

Now, instead of running existing.xsl , we run this transformation : 现在,我们运行此转换,而不是运行existing.xsl

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="ext">
 <xsl:import href="file:///c:/temp/delete/existing.xsl"/>
 <xsl:output method="html"/>


  <xsl:template match="/">
    <xsl:variable name="vrtfPass1">
       <xsl:apply-imports/>
    </xsl:variable>

    <xsl:apply-templates select=
        "ext:node-set($vrtfPass1)" mode="pass2"/>
  </xsl:template>

 <xsl:template match="node()|@*" mode="pass2">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="pass2"/>
  </xsl:copy>
 </xsl:template>

  <xsl:template mode="pass2" match="*[preceding-sibling::node()[1][self::*]]">
   <xsl:text>&#xA;</xsl:text>
   <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>

The result is the wanted one and the existing code is untouched at all : 结果是想要的,现有的代码完全不受影响

<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>

Explanation : 说明

  1. We import any existing code that is at the top level of the import-precedence hierarchy (not imported by other stylesheets), using xsl:import . 我们使用xsl:import位于导入优先级层次结构顶层的任何现有代码(不是由其他样式表xsl:import

  2. We capture the output of the existing transformation in a variable. 我们捕获变量中现有变换的输出。 It has the infamous RTF ( Result Tree Fragment ) that needs to be converted to regular tree to be processed further. 它有臭名昭着的RTF( 结果树片段 ),需要转换为常规树进一步处理。

  3. The key moment is performing xsl:apply-imports when capturing the output of the transformation. 关键时刻是在捕获转换输出时执行xsl:apply-imports This ensures that any template from the existing code (even one that we override -- such as the template matching / ) will be selected for execution as in the case when the existing transformation is performed by itself). 这确保了现有代码中的任何模板(即使是我们覆盖的模板 - 例如模板匹配/ )也将被选择执行,就像现有转换由其自身执行时一样)。

  4. We convert the RTF into a regular tree using the msxsl:node-set() extension function (XslCompiledTransform also supports the EXSLT node-set() extension function). 我们使用msxsl:node-set()扩展函数将RTF转换为常规树(XslCompiledTransform也支持EXSLT node-set()扩展函数)。

  5. We perform our cosmetic adjustments on the so produced regular tree. 我们对如此制作的常规树进行美容调整。

Do Note : 请注意

This represents a general algorithm for post-processing existing transformations without touching the existing code . 这表示用于对现有变换进行后处理而不触及现有代码的通用算法

I don't remember the details of XML/XSLT space preservation off the top of my head, but one instance where it's more likely to discard whitespace is between elements where there is no non-whitespace text (ie whitespace-only text nodes, like the one between </a> and </xsl:for-each> ). 我不记得我头脑中XML / XSLT空间保存的细节,但是更有可能丢弃空白的一个实例是在没有非空白文本的元素之间(即仅空白文本节点,如</a></xsl:for-each>之间的那个。 You can prevent this by using the <xsl:text> element. 您可以使用<xsl:text>元素来防止这种情况。

For example, after 例如,之后

          <a href="#">
                 some text
          </a>

put

          <xsl:text>&#10;</xsl:text>

Ie a literal line end character. 即字面行结束字符。

Does that meet your requirements? 这符合您的要求吗?

I think the problem is: 我认为问题是:

  <xsl:output method="html" indent="yes"/> 

If I remember correctly html tries to only care about whitespace which is important to how the HTML will be displayed. 如果我没记错的话html尝试只关心空格,这对HTML的显示方式很重要。

If you try: 如果你试试:

  <xsl:output method="xml" indent="yes"/> 

Then it should create the indented whitespace you expect. 然后它应该创建您期望的缩进空白。

Whitespace text nodes in the stylesheet are always ignored, unless they are contained in xsl:text. 除非它们包含在xsl:text中,否则始终忽略样式表中的空白文本节点。 If you want to output whitespace to the result tree, use xsl:text. 如果要将空格输出到结果树,请使用xsl:text。

(It's also possible to use xml:space="preserve" in the stylesheet, but it's generally not advisable as it has unwanted side-effects.) (也可以在样式表中使用xml:space =“preserve”,但它通常不可取,因为它有不必要的副作用。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM