简体   繁体   English

XSLT性能注意事项

[英]XSLT Performance Considerations

I am working on a project which uses following technologies. 我正在开发一个使用以下技术的项目。 Java, XML, XSLs Java,XML,XSL

There's heavy use of XMLs. 大量使用XML。 Quite often I need to - convert one XML document into another - convert one XML document into another after applying some business logic. 我经常需要 - 将一个XML文档转换为另一个XML文档 - 在应用一些业务逻辑之后将一个XML文档转换为另一个XML文档。

Everything will be built into a EAR and deployed on an application server. 所有内容都将构建到EAR中并部署在应用程序服务器上。 As the number of user is huge, I need to take performance into consideration before defining coding standards. 由于用户数量巨大,我需要在定义编码标准之前考虑性能。

I am not a very big fan of XSLs but I am trying to understand if using XSLs a better option in this scenario or should I stick of Java only. 我不是XSL的忠实粉丝,但我试图理解在这种情况下使用XSL是否是更好的选择,还是我应该只使用Java。 Note that I have requirements to convert XML into XML format only. 请注意,我有将XML转换为XML格式的要求。 I don't have requirements to convert XML into some other format like HTML etc. 我没有要求将XML转换为HTML等其他格式的要求。

From performance and manitainability point of view - isnt JAVA a better option than using XLST for XML to XML transformations? 从性能和可管理性的角度来看 - 与使用XLST进行XML到XML转换相比,JAVA不是更好的选择吗?

From my previous experience of this kind of application, if you have a performance bottleneck, then it won't be the XSLT processing. 从我以前的这种应用程序的经验来看,如果你有性能瓶颈,那么它就不会是XSLT处理。 (The only exception might be if the processing is very complex and the programmer very inexperienced in XSLT.) There may be performance bottlenecks in XML parsing or serialisation if you are dealing with large documents, but these will apply whatever technology you use for the transformation. (唯一的例外可能是处理非常复杂且程序员在XSLT中缺乏经验。)如果处理大型文档,XML解析或序列化可能存在性能瓶颈,但这些将适用于您用于转换的任何技术。

Simple transformations are much simpler to code in XSLT than in Java. 在XSLT中进行简单转换比在Java中进行简单转换要简单得多。 Complex transformations are also usually simpler to code in XSLT, unless they make heavy use of functionality available for free in the Java class library (an example might be date parsing). 复杂的转换在XSLT中通常也更容易编码,除非它们大量使用Java类库中可用的免费功能(例如,可能是日期解析)。 Of course, that's only true for people equally comfortable with coding in both languages. 当然,这对于同样适合两种语言编码的人来说都是如此。

Of course, it's impossible to give any more than arm-waving advice about performance until you start talking concrete numbers. 当然,在你开始谈论具体数字之前,不可能只提出有关性能的武器建议。

I agree with above responses. 我同意上述回应。 XSLT is faster and more concise to develop than performing transformations in Java. 与在Java中执行转换相比,XSLT的开发速度更快,更简洁。 You can change XSLT without having to recompile the entire application (just re-create EAR and redeploy). 您可以更改XSLT而无需重新编译整个应用程序(只需重新创建EAR并重新部署)。 Manual transformations should we always faster but the code might be much larger than XSLT due to XPATH and other technologies allowing very condensed and powerful expressions. 手动转换应该总是更快,但代码可能比XSLT大得多,因为XPATH和其他技术允许非常简洁和强大的表达式。 Try several XSLT engines (java provided, saxon, xalan...) and try to debug and profile the XSLT, using tools like standalone IDE Altova XMLSpy to detect bottleneck. 尝试几个XSLT引擎(java提供,saxon,xalan ......)并尝试使用独立IDE Altova XMLSpy等工具来检测和分析XSLT,以检测瓶颈。 Try to load the XSLT transformation and reuse it when processing several XMLs that require the same transformation. 尝试加载XSLT转换并在处理需要相同转换的多个XML时重用它。 Another option is to compile the XSLT to Java classes, allowing faster parsing (saxon seems to allow it), but changes are not as easy as you need to re-compile XSLT and classes generated. 另一个选择是将XSLT编译为Java类,允许更快的解析(saxon似乎允许它),但更改并不像您需要重新编译XSLT和生成的类那么容易。

We use XSLT and XSL-FO to generate invoices for a billing software. 我们使用XSLT和XSL-FO为计费软件生成发票。 We extract the data from database and create an XML file, transform it with XSLT using XSL-FO and process the result XML (FO instructions) to generate a PDF using Apache FOP. 我们从数据库中提取数据并创建XML文件,使用XSLT使用XSL-FO对其进行转换,并使用Apache FOP处理结果XML(FO指令)以生成PDF。 When generating invoices of several pages, job is done in less than a second in a multi-user environment and on a user-request basis (online processing). 当生成多个页面的发票时,在多用户环境中并且基于用户请求(在线处理)在不到一秒的时间内完成作业。 We do also batch processing (billing cycles) and the job is done faster as reusing the XSLT transformation. 我们还进行批处理(计费周期),并且通过重用XSLT转换可以更快地完成作业。 Only for very-large PDF documents (>100 pages) we have some troubles (minutes) but the most expensive task is always processing XML with FO to PDF, not XML to XML with XSLT. 仅对于非常大的PDF文档(> 100页),我们遇到了一些麻烦(分钟),但最昂贵的任务是始终使用FO处理XML到PDF,而不是使用XSLT处理XML到XML。

As always said, if you need more processing power, you can just "add" more processors and do the jobs in parallel easily. 如前所述,如果您需要更多处理能力,您只需“添加”更多处理器并轻松并行完成工作。 I think time saved using XSLT if you have some experience using it can be used to buy more hardware. 我认为如果你有一些使用它的经验,使用XSLT节省的时间可以用来购买更多的硬件。 It's the dichotomy of using powerful development tools to save development time and buy more hardware or do things "manually" in order to get maximum performance. 这是使用强大的开发工具来节省开发时间和购买更多硬件或“手动”执行操作以获得最佳性能的二分法。

Integration tools like ESB are heavily based on XSLT transformations to adapt XML data from one system (sender) to another system (receiver) and usually can perform hundreds of "transactions" (data processing and integration) in a second. 像ESB这样的集成工具大量基于XSLT转换,以使XML数据从一个系统(发送方)适应另一个系统(接收方),并且通常可以在一秒钟内执行数百个“事务”(数据处理和集成)。

If you use a modern XSLT processor, such as Saxon (available in a free version), you will find the performance to be quite good. 如果您使用现代XSLT处理器,例如Saxon(免费版本),您会发现性能非常好。 Also, in the long term XSL transforms will be much more maintainable than hardcoded Java classes. 此外,从长远来看,XSL转换将比硬编码的Java类更易于维护。

(I have no connection with the authors of Saxon) (我与撒克逊的作者没有关系)

Here is my observation based on empirical data. 以下是基于经验数据的观察结果。 I use xslt extensively , and in many cases as an alternative for data processors implemented in java. 我广泛使用xslt,并且在许多情况下作为java中实现的数据处理器的替代方案。 Some of the data processors we compiled are a bit more involved. 我们编译的一些数据处理器涉及更多。 We primarily use SAXON EE, through the oxygenxml editor. 我们主要通过oxygenxml编辑器使用SAXON EE。 Here is what we have noticed in terms of the performance of the transformation. 以下是我们在转型表现方面所注意到的。

For less complex xsl stylesheets, the performance is quite good ( 2s to read a 30MB xml file and generate over 20 html content pages, with a lot of div structures) . 对于不太复杂的xsl样式表,性能非常好(2s读取30MB xml文件并生成超过20个html内容页面,具有大量div结构)。 and the variance in performance seems about linear or less with respect to change in the size of the file. 并且相对于文件大小的变化,性能的变化似乎是线性的或更小的。

However, when the complexity of the xsl stylesheet changes, the performance change can be exponential.( same file , with a function call introduced in template called often,with the function implementing a simple xpath resolution, can change the processing time , for the same file , from 2s to 24s) And it seems introduction of functions and function calls seem to be a major culprit. 但是,当xsl样式表的复杂性发生变化时,性能变化可能是指数级的。(相同的文件,通常在模板中引入函数调用,实现简单的xpath分辨率,可以改变处理时间,同样文件,从2s到24s)似乎功能和函数调用的引入似乎是一个主要的罪魁祸首。 That said, we have not done a detailed performance review and code optimization. 也就是说,我们还没有进行详细的性能评估和代码优化。 ( still in alpha mode, and the performance is still within our limits - ie batch job ). (仍处于alpha模式,性能仍在我们的限制范围内 - 即批量作业)。 I must admit that we may have "abused" xsl function, as in a lot of places we used th idea of code abstraction into functions ( in addition to using templates ) . 我必须承认,我们可能已经“滥用”了xsl函数,因为在很多地方我们都使用了代码抽象的功能(除了使用模板)。 My suspicion is that, due t the nature in which xslt templates are called, there might be a lot of eventual recursion in the implementation procedures ( for the xslt processor), and function calls can become expensive if they are not optimized . 我怀疑,由于调用xslt模板的性质,在实现过程中可能会有很多最终的递归(对于xslt处理器),如果没有优化函数调用会变得昂贵。 We think a change in "strategy" in way we write our xsl scripts, (to be more XSLT/XPATH centric) may help performance of the xlst processor. 我们认为在编写xsl脚本的过程中,“策略”的改变(更多以XSLT / XPATH为中心)可能有助于xlst处理器的性能。 For instance, use of xsl keys. 例如,使用xsl键。 so yes, we maybe just as guilty as the processor charged :) 是的,我们可能和处理器收费一样内疚:)

One other performance issue, is memory utilization. 另一个性能问题是内存利用率。 While RAM is not technically a problem , but a simple processor ramping from 1GB ( !!! ) to 6GB for a single invocation/transformation is not exactly kosher. 虽然RAM在技术上不是问题,但是对于单个调用/转换而言,从1GB(!!!)到6GB的简单处理器并不完全是犹太教。 There maybe scalability and capacity concerns ( depending on application and usage). 可能存在可扩展性和容量问题(取决于应用程序和使用情况)。 This may be something less to do with the underlying xlst processor, and more to do with the editor tool.This seems to have a huge impact on debugging the style sheets in real time ( ie stepping through the xslt ) . 这可能与底层的xlst处理器关系不大,而且与编辑器工具有关。这似乎对实时调试样式表有很大的影响(即单步执行xslt)。

Few observations: - commandline or "production" invocation of the processor has better performance - for consecutive runs ( invoking the xslt processor), the first run takes the longest ( say 10s) and consecutive runs take a lot less ( say 4s ) .Again, maybe something to do with the editor environment. 很少有观察结果: - 处理器的命令行或“生产”调用具有更好的性能 - 对于连续运行(调用xslt处理器),第一次运行时间最长(比如10s),连续运行时间要少得多(比如4s)。 ,也许与编辑环境有关。

That said, while performance of the processors may be a pain at times , and depending on the application requirements, it is my opinion that if you consider other factors already mentioned here, such as code maintenance, ease of implementation, rapid changes, size of code base, the performance issues may be mitigated, or can be "accepted" ( if the end application can still live with the perfomance numbers ) when comparing implementation using XSLT vs Java ( or other ) 也就是说,虽然处理器的性能有时很难,并且取决于应用程序的要求,但我认为如果你考虑这里已经提到的其他因素,例如代码维护,易于实现,快速更改,大小在使用XSLT与Java(或其他)比较实现时,代码库,可以减轻性能问题,或者可以“接受”(如果最终应用程序仍然可以使用性能数字)

...adieu! ...再见!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM