简体   繁体   English

OutOfMemoryError:使用XSLT转换的Java堆空间

[英]OutOfMemoryError: Java heap space using XSLT transform

I want to transform XML file using XSLT . 我想使用XSLT转换XML文件。 I made: 我做了:

TransformerFactory factory = TransformerFactory.newInstance();
    InputStream is = 
this.getClass().getResourceAsStream(getPathToXSLTFile());
    Source xslt = new StreamSource(is);
    Transformer transformer = factory.newTransformer(xslt);
    Source text = new StreamSource(new File(getInputFileName()));
    transformer.transform(text, new StreamResult(new File(getOutputFileName())));

Which input file have about 10000000 lines, I have error: 哪个输入文件有大约10000000行,我有错误:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xml.internal.utils.FastStringBuffer.append(FastStringBuffer.java:682)
at com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM.characters(SAX2DTM.java:2111)
at com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.characters(SAXImpl.java:863)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.characters(AbstractSAXParser.java:546)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:455)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:421)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:556)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:739)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:351)
at ru.magnit.task.utils.AbstractXmlUtil.transformXML(AbstractXmlUtil.java:66)
at ru.magnit.task.EntryPoint.main(EntryPoint.java:72)

In this line: 在这一行:

 transformer.transform(text, new StreamResult(new File(getOutputFileName())));

What is the reason for this and can it be optimized somehow, without the size of the heap? 这是什么原因,可以在没有堆大小的情况下以某种方式对其进行优化吗?

UPDATE: My XSLT file: 更新:我的XSLT文件:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="entries">
    <entries>
        <xsl:apply-templates/>
    </entries>
</xsl:template>

<xsl:template match="entry">
    <entry>
        <xsl:attribute name="field">
            <xsl:apply-templates select="*"/>
        </xsl:attribute>
    </entry>
</xsl:template>

In general XSLT 1.0 and 2.0 work with a data model which pulls the complete XML input into a tree model to allow full XPath navigation, resulting in a memory usage that increases with the size of the input document. 通常,XSLT 1.0和2.0与数据模型一起使用,该数据模型将完整的XML输入拉到树模型中以允许完整的XPath导航,从而导致内存使用量随输入文档的大小而增加。

So unless you increase the heap space if your current document size leads to memory shortage there is not much you can do, at least not in general, there might be XSLT processor specific and some XSLT specific optimizations depending on your concrete XSLT code, but you can't avoid that the processor first pulls in the complete document. 因此,除非您增加堆空间(如果当前文档大小导致内存不足),否则您将无能为力,至少通常不会这样做,这取决于具体的XSLT代码可能会有特定于XSLT处理器和某些特定于XSLT的优化,但是不能避免处理器首先提取整个文档。 We would need to see your XSLT to try to tell whether it can be optimized. 我们将需要查看您的XSLT来尝试确定它是否可以优化。 Profiling a stylesheet can help to identify areas to be optimized, I am not sure whether Xalan supports that. 对样式表进行概要分析可以帮助确定要优化的区域,但我不确定Xalan是否支持。 And I am not sure whether that stack trace not simply means that Xalan already runs out of memory when building the DTM (its tree model) for your large input, in that case obviously optimizing the XSLT code does not help as it is not even executed. 而且我不确定堆栈跟踪是否仅表示在为大型输入构建DTM(其树模型)时Xalan已经用尽了内存,在这种情况下,显然优化XSLT代码无济于事,因为它甚至没有执行。

A Java specific way you could attempt is to use https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/sax/SAXTransformerFactory.html instead to create a SAX filter from your stylesheet and chain it with a default Transformer to serialize the result of the filter, I think I have once tried that and found it can consume less memory than the traditional approach with a Transformer. 您可以尝试使用的Java特定方法是使用https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/sax/SAXTransformerFactory.html从样式表和链中创建SAX过滤器它使用默认的Transformer来序列化过滤器的结果,我想我曾经尝试过,发现它可以比使用Transformer的传统方法消耗更少的内存。

XSLT 3.0 tries to address the memory problem with the new approach of streaming ( https://www.w3.org/TR/xslt-30/#streaming-concepts ), however so far there is only one implementation with Saxon 9 EE, a commercial product. XSLT 3.0尝试使用新的流传输方法( https://www.w3.org/TR/xslt-30/#streaming-concepts )解决内存问题,但是到目前为止,Saxon 9 EE仅有一种实现,商业产品。 And in general a stylesheet is not necessarily streamable, instead you have to rewrite it to make it streamable (if that is at all possible, for instance sorting input nodes is not possible with streaming). 通常,样式表不一定是可流式的,相反,您必须对其进行重写以使其可流式传输(如果完全可以,例如,使用流式传输无法对输入节点进行排序)。

For instance, your posted stylesheet converted to XSLT 3.0 to use streaming (no rewrite necessary, only needed to set up the default mode as streamable) is 例如,您发布的样式表已转换为XSLT 3.0以使用流式传输(无需重写,只需要将默认模式设置为可流式传输)即可

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:mode streamable="yes"/>

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="entries">
        <entries>
            <xsl:apply-templates/>
        </entries>
    </xsl:template>

    <xsl:template match="entry">
        <entry>
            <xsl:attribute name="field">
                <xsl:apply-templates select="*"/>
            </xsl:attribute>
        </entry>
    </xsl:template>

</xsl:stylesheet>

and Saxon 9.8 EE and the beta of Exselt assess that as streamable. Saxon 9.8 EE和Exselt的beta版本认为这是可简化的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM