简体   繁体   English

如何使用 Node.js 中的 XSLT 转换 HTML 字符串

[英]How to transform HTML string using XSLT in Node.js

I have a string from in a Node.js app that I need to transform using XSLT on the server side.我有一个来自 Node.js 应用程序的字符串,我需要在服务器端使用 XSLT 进行转换。 The main "transformations" I need to do are removing specific HTML tags and I can't use regex due to security/performance issues.我需要做的主要“转换”是删除特定的 HTML 标签,由于安全/性能问题,我不能使用正则表达式。 I will also be using the result of the transformation to then make POST requests to an API.我还将使用转换结果向 API 发出 POST 请求。

A simple example may look something like:一个简单的示例可能类似于:

"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.</p>\n<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti.</p>"

And I need to transform it to the following (basically just remove <p> tags in this case):而且我需要将其转换为以下内容(在这种情况下基本上只是删除<p>标记):

"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.\nLorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti."

Here are the main questions I have:以下是我的主要问题:

  • Can I use saxon-js to make these changes?我可以使用 saxon-js 进行这些更改吗? If so, I am struggling to figure out how based on their docs.如果是这样,我正在努力弄清楚如何基于他们的文档。
  • Is there another way to implement XSLT on a node app?还有另一种方法可以在节点应用程序上实现 XSLT 吗?

Well, I took the bait, here is how you can do that with SaxonJS:好吧,我上钩了,下面是使用 SaxonJS 的方法:

const SaxonJS = require("saxon-js")

var input = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.</p>\n<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti.</p>";

const xslt = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
  <xsl:output method="text"/>
</xsl:stylesheet>`;

var result = SaxonJS.XPath.evaluate(`transform(
  map {
    'source-node' : parse-xml-fragment($xml),
    'stylesheet-text' : $xslt,
    'delivery-format' : 'serialized'
    }
)?output`,
[],
{ params : {
    xml : input,
    xslt : xslt
  }
});
   
 console.log(result);

Using output method text will remove all elements, if you don't want that use the default and add <xsl:mode on-no-match="shallow-skip"/> and add templates for those elements you want to preserve eg <xsl:template match="h1"><xsl:copy><xsl:apply-templates/></xsl:copy></xsl:template> or approach it the other way around and use <xsl:mode on-no-match="shallow-copy"/> and block what you don't want with matching templates doing eg <xsl:template match="p"><xsl:apply-templates/></xsl:template> .使用 output 方法文本将删除所有元素,如果您不希望使用默认值并添加<xsl:mode on-no-match="shallow-skip"/>并为您要保留的元素添加模板,例如<xsl:template match="h1"><xsl:copy><xsl:apply-templates/></xsl:copy></xsl:template>或以其他方式接近它并使用<xsl:mode on-no-match="shallow-copy"/>并通过匹配模板阻止您不想要的内容,例如<xsl:template match="p"><xsl:apply-templates/></xsl:template>

And in the end, once your stylesheet works and is finished, you should "compile" it with eg xslt3 -nogo -xsl:sheet.xsl -export:sheet.sef.json to SEF/JSON and then use the direct transformation API from JavaScript eg SaxonJS.transform .最后,一旦您的样式表工作并完成,您应该使用例如xslt3 -nogo -xsl:sheet.xsl -export:sheet.sef.json将其“编译”为 SEF/JSON,然后使用直接转换 API 从JavaScript 例如SaxonJS.transform

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM