简体   繁体   English

Xalan XSLT - 内存堆空间不足

[英]Xalan XSLT - Out of Memory Heap Space

My project has a reporting module that gathers data from the database in the form of XML and runs an XSLT on it to generate the user's desired format of report. 我的项目有一个报告模块,它以XML的形式从数据库中收集数据,并在其上运行XSLT以生成用户所需的报告格式。 Options at this point are HTML and CSV. 此时的选项是HTML和CSV。

We use Java and Xalan to do all interaction with the data. 我们使用Java和Xalan来完成与数据的所有交互。

The bad part is that one of these reports that the user can request is 143MB (about 430,000 records) for just the XML portion. 不好的部分是用户可以请求的这些报告之一是仅仅XML部分的143MB(大约430,000条记录)。 When this is transformed into HTML, I run out of heap space with a maximum of 4096G reserved for heap. 当它转换为HTML时,我的堆空间不足,最多为堆保留4096G。 This is unacceptable. 这是无法接受的。

It seems that the problem is simply too much data, but I can't help but think there is a better way to deal with this than limiting the customer and not being able to meet functional requirements. 似乎问题只是数据过多,但我不禁认为有一种更好的方法来解决这个问题,而不是限制客户而不能满足功能要求。

I am glad to give more information as needed, but I cannot disclose too much about the project as I'm sure most of you understand. 我很高兴根据需要提供更多信息,但我不能透露太多关于该项目的信息,因为我相信大多数人都理解。 Also, the answer is yes; 此外,答案是肯定的; I need all of the data at the same time: I cannot paginate it. 我需要同时处理所有数据:我无法对其进行分页。

Thanks 谢谢

EDIT 编辑

All the transformation classes I am using are in the javax.xml.transform package. 我正在使用的所有转换类都在javax.xml.transform包中。 The implementation looks like this: 实现如下:

final Transformer transformer = 
  TransformerFactory.newInstance().newTransformer(
    new StreamSource(new StringReader(xsl)));
final StringWriter outWriter = new StringWriter();
transformer.transform(
  new StreamSource(new StringReader(xml)), new StreamResult(outWriter));
return outWriter.toString();

If possible, I would like to leave the XSLT the way it is. 如果可能的话,我想按原样离开XSLT。 The StreamSource method of doing things should allow me to GC some of the data as it is processed, but I'm not sure what limitations on XSLT (functions, etc) this might require for it to do proper cleanup. StreamSource方法应该允许我处理GC中的一些数据,但是我不确定XSLT(函数等)有什么限制,这可能需要它来进行适当的清理。 If someone could point me at a resource detailing those limitations, it would be very helpful. 如果有人能指出我详细说明这些限制的资源,那将非常有帮助。

The problem with XSLT is that you need to have a DOM representation of the whole source document (as well as the result document) in memory while doing the transformation. XSLT的问题在于,在进行转换时,您需要在内存中拥有整个源文档(以及结果文档)的DOM表示。 For large XML files this is a serious problem. 对于大型XML文件,这是一个严重的问题。

You are interested in a system that allows a streaming transformation where the full documents do not have to recide in memory. 您感兴趣的是一个允许流式转换的系统,其中完整的文档不必记忆在内存中。 Maybe STX is an option: http://www.xml.com/pub/a/2003/02/26/stx.html http://stx.sourceforge.net/ . 也许STX是一个选项: http//www.xml.com/pub/a/2003/02/26/stx.html http://stx.sourceforge.net/ It is quite similar to XSLT, so if your XSLT stylesheet is applied to the XML in a straight-forward manner, rewriting it to STX could be quite simple. 它与XSLT非常相似,因此如果您的XSLT样式表以直接的方式应用于XML,则将其重写为STX可能非常简单。

We are able to improve this by doing two things. 我们可以通过做两件事来改善这一点。

  1. We take the XML source and destination format and make them files in temp. 我们采用XML源和目标格式,并将它们设置为temp。 This keeps the initial creation and storage out of RAM, since the data is coming from a database and being written back to the DB as well. 这使得初始创建和存储不受RAM影响,因为数据来自数据库并且也被写回数据库。 A handle to the data is all that's necessary. 只需要一个数据句柄就可以了。

  2. Use the Saxonica transformer. 使用Saxonica变压器。 This allows for a couple things including SAX-style transformations and the use of XSLT 2.0, which the Xalan parser does not. 这允许一些事情,包括SAX风格的转换和XSL解析器没有的XSLT 2.0的使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM