简体   繁体   中英

How to transform HTML string using XSLT in Node.js

I have a string from in a Node.js app that I need to transform using XSLT on the server side. The main "transformations" I need to do are removing specific HTML tags and I can't use regex due to security/performance issues. I will also be using the result of the transformation to then make POST requests to an API.

A simple example may look something like:

"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.</p>\n<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti.</p>"

And I need to transform it to the following (basically just remove <p> tags in this case):

"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.\nLorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti."

Here are the main questions I have:

  • Can I use saxon-js to make these changes? If so, I am struggling to figure out how based on their docs.
  • Is there another way to implement XSLT on a node app?

Well, I took the bait, here is how you can do that with SaxonJS:

const SaxonJS = require("saxon-js")

var input = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Aliquam porttitor gravida velit, et facilisis est viverra a. Suspendisse potenti.</p>\n<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed suscipit felis. Suspendisse potenti.</p>";

const xslt = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
  <xsl:output method="text"/>
</xsl:stylesheet>`;

var result = SaxonJS.XPath.evaluate(`transform(
  map {
    'source-node' : parse-xml-fragment($xml),
    'stylesheet-text' : $xslt,
    'delivery-format' : 'serialized'
    }
)?output`,
[],
{ params : {
    xml : input,
    xslt : xslt
  }
});
   
 console.log(result);

Using output method text will remove all elements, if you don't want that use the default and add <xsl:mode on-no-match="shallow-skip"/> and add templates for those elements you want to preserve eg <xsl:template match="h1"><xsl:copy><xsl:apply-templates/></xsl:copy></xsl:template> or approach it the other way around and use <xsl:mode on-no-match="shallow-copy"/> and block what you don't want with matching templates doing eg <xsl:template match="p"><xsl:apply-templates/></xsl:template> .

And in the end, once your stylesheet works and is finished, you should "compile" it with eg xslt3 -nogo -xsl:sheet.xsl -export:sheet.sef.json to SEF/JSON and then use the direct transformation API from JavaScript eg SaxonJS.transform .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM