Is it possible to cache XML documents in Saxon to avoid re-parsing and re-indexing?

Question

I am currently assessing whether XSLT3 with Saxon could be useful for our purposes. Please hear me out.

We are developing a REST API which provides credentials given an input request XML. Basically, there are 3 files in play:

site.xml :
- This file holds the data representing the complete organisation: users, roles, credentials, settings, ...
- It could easily contain 10.000 lines.
- It could be considered as static/immutable.
- You could compare it as XML representation of a database, so to say.
request.xml :
- This file holds the request as provided to the REST API.
- It is rather small, usually around 10 to 50 lines.
- It is different for each request.
request.xslt :
- This file holds the stylesheet to convert the given request.xml to an output XML.
- It loads site.xml via the XSLT document() function, as it needs that data to fulfill the request.

The problem here is that loading site.xml in request.xslt takes a long time. In addition, for each request, indexes as introduced by the XSLT <xsl:key .../> directive must be rebuilt. This adds up.

So it would make sense to somehow cache site.xml , to avoid having to parse and index that file for every request.

It's important to note that multiple API requests can happen concurrently, thus it should be safe to share this cached site.xml between several ongoing XSLT transformations.

Is this possible with Saxon (Java)? How would that work?

Update 1

After some additional reflecting, I realize that maybe I should not attempt to just cache the site.xml XML file, but the request.xslt instead? This assumes that site.xml , which is loaded in request.xslt via document() , is part of that cache.

Answer 1

It would help if you show/tell us which API you use to run XSLT with Saxon.

As for caching the XSLT, with JAXP I think you can do that with a Templates created with newTemplates from the TransformerFactoryImpl ( http://saxonica.com/html/documentation/using-xsl/embedding/jaxp-transformation.html ), each time you want to run the XSLT you will to create a Transformer with newTransformer() .

With the s9api API you can compile once to get an XsltExecutable ( http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XsltExecutable.html ) that "is immutable, and therefore thread-safe", you then have to us load() or load30() to create an XsltTransformer or Xslt30Transformer each time you need to run the code.

As for sharing a document, see http://saxonica.com/html/documentation/sourcedocs/preloading.html :

An option is available (Feature.PRE_EVALUATE_DOC_FUNCTION) to indicate that calls to the doc() or document() functions with constant string arguments should be evaluated when a query or stylesheet is compiled, rather than at run-time. This option is intended for use when a reference or lookup document is used by all queries and transformations

The section on that configuration option, however, states:

In XSLT 3.0 a better way of having external documents pre-loaded at stylesheet compile time is to use the new facility of static global variables.

So in that case you could declare

<xsl:variable name="site-doc" static="yes" select="doc('site.xml')"/>

You will need to wait on Michael Kay's response as to whether that suffices to share the document.

Answer 2

Well, it is certainly possible, but the best way of doing it depends a little on the circumstances, eg what happens when site.xml changes.

I would be inclined to create a single s9api Processor at application startup, and immediately (that is, during application initialization) load site.xml into an XdmNode using Processor.DocumentBuilder.build() ; this can then be passed as a parameter value (an <xsl:param> ) into each transformation that uses it. Or if you prefer to access it using document() , you could register a URIResolver that responds to the document() call by returning the relevant XdmNode .

As for indexing and the key() function, so long as the xsl:key definition is "sharable", then if two transformations based on the same compiled stylesheet (s9api XsltExecutable ) access the same document, the index will not be rebuilt. An xsl:key definition is shareable if its match and use attributes do not depend on anything that can vary from one transformation to another, such as the content of global variables or parameters.

Saxon's native tree implementations (unlike the DOM) are thread-safe: if you build a document once, you can access it in multiple threads. The building of indexes to support the key() function is synchronized so concurrent transformations will not interfere with each other.

Martin's suggestion of allowing compile-time evaluation of the document() call would also work. You could also put the document into a global variable defined with static="yes". This doesn't play well, however, with exporting compiled stylesheets into persistent files: there are some restrictions that apply when exporting a stylesheet that contains node-valued static variables.

Is it possible to cache XML documents in Saxon to avoid re-parsing and re-indexing?

Question

2 answers

solution1
1 ACCPTED 2019-09-12 13:17:18

solution2
1 2019-09-12 14:39:03

Is it possible to cache XML documents in Saxon to avoid re-parsing and re-indexing?

Question

2 answers

solution1 1 ACCPTED 2019-09-12 13:17:18

solution2 1 2019-09-12 14:39:03

solution1
1 ACCPTED 2019-09-12 13:17:18

solution2
1 2019-09-12 14:39:03