JAXB - unmarshal OutOfMemory：Java堆空间

Question

I'm currently trying to use JAXB to unmarshal an XML file, but it seems that the XML file is too large (~500mb) for the unmarshaller to handle. 我目前正在尝试使用JAXB来解组XML文件，但似乎XML文件太大（~500mb）以供unmarshaller处理。 I keep getting java.lang.OutOfMemoryError: Java heap space @ 我一直得到java.lang.OutOfMemoryError: Java heap space @

Unmarshaller um = JAXBContext.newInstance("com.sample.xml");
Export e = (Export)um.unmarhsal(new File("SAMPLE.XML"));

I'm guessing this is becuase it's trying to open the large XML file as an object, but the file is just too large for the java heap space. 我猜这是因为它试图将大型XML文件作为对象打开，但该文件对于Java堆空间来说太大了。

Is there any other more 'memory efficient' method of parsing large XML files ~ 500mb? 有没有其他更“'内存有效'的方法来解析大型XML文件~500mb？ Or perhaps an unmarshaller property that may help me handle the large XML file? 或者也许是一个unmarshaller属性可以帮助我处理大型XML文件？

Here's what my XML looks like 这是我的XML的样子

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- -->
<Export xmlns="wwww.foo.com" xmlns:xsi="www.foo1.com" xsi:schemaLocation="www.foo2.com/.xsd">
<!--- --->
<Origin ID="foooo" />
<!---- ---->
<WorkSets>
   <WorkSet>
      <Work>
         .....
      <Work>
         ....
      <Work>
      .....
   </WorkSet>
   <WorkSet>
      ....
   </WorkSet>
</WorkSets>

I'd like to unmarshal at the WorkSet level, still being able to read through all of the work for each WorkSet. 我想在WorkSet级别解组，仍然能够阅读每个WorkSet的所有工作。

Answer 1

What does your XML look like? 你的XML是什么样的？ Typically for large documents I recommend people leverage a StAX XMLStreamReader so that the document can be unmarshalled by JAXB in chunks. 通常，对于大型文档，我建议人们使用StAX XMLStreamReader，以便JAXB可以在块中解组文档。

input.xml input.xml中

In the document below there are many instances of the person element. 在下面的文档中，有许多person元素的实例。 We can use JAXB with a StAX XMLStreamReader to unmarshal the corresponding Person objects one at a time to avoid running out of memory. 我们可以使用JAXB和StAX XMLStreamReader解组相应的Person对象，以避免内存不足。

<people>
   <person>
       <name>Jane Doe</name>
       <address>
           ...
       </address>
   </person>
   <person>
       <name>John Smith</name>
       <address>
           ...
       </address>
   </person>
   ....
</people>

Demo 演示

import java.io.*;
import javax.xml.stream.*;
import javax.xml.bind.*;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        JAXBContext jc = JAXBContext.newInstance(Person.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();
        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            Person person = (Person) unmarshaller.unmarshal(xsr);
        }
    }

}

Person 人

Instead of matching on the root element of the XML document we need to add @XmlRootElement annotations on the local root of the XML fragment that we will be unmarshalling from. 我们需要在XML片段的本地根目录上添加@XmlRootElement注释，而不是匹配XML文档的根元素，我们将从中解组。

@XmlRootElement
public class Person {
}

Answer 2

You could increase the heap space using the -Xmx startup argument. 您可以使用-Xmx启动参数来增加堆空间。

For large files, SAX processing is more memory-efficient since it's event driven, and doesn't load the entire structure in to memory. 对于大型文件，SAX处理因为事件驱动而具有更高的内存效率，并且不会将整个结构加载到内存中。

Answer 3

I've been doing a lot of research in particular with regards to parsing very large input sets conveniently. 我一直在做很多研究，特别是在解析非常大的输入集方面。 It's true that you could combine StaX and JaxB to selectively parse XML fragments, but it's not always possible or preferable. 确实，您可以将StaX和JaxB组合在一起以选择性地解析XML片段，但这并不总是可行或更可取。 If you're interested to read more on the topic please have a look at: 如果您有兴趣阅读有关该主题的更多信息，请查看：

http://xml2java.net/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf http://xml2java.net/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf

In this document I describe an alternative approach that is very straight forward and convenient to use. 在本文档中，我描述了一种非常简单易用的替代方法。 It parses arbitrarily large input sets, whilst giving you access to your data in a javabeans fashion. 它解析任意大的输入集，同时让您以javabeans的方式访问您的数据。

Answer 4

Use SAX or StAX . 使用SAX或StAX 。 But if the goal is to have an in-memory object representation of the file, you'll still need lots of memory to hold the contents of such a big file. 但如果目标是拥有文件的内存中对象表示，那么仍然需要大量内存来保存这样一个大文件的内容。 In this case, your only hope is to increase the heap size using the -Xmx1024m JVM option (which sets the max heap size to 1024 MBs) 在这种情况下，您唯一的希望是使用-Xmx1024m JVM选项（将最大堆大小设置为1024 MB）来增加堆大小

Answer 5

You can try this too this is kind of not good practice but its working :) who cares 你可以试试这个也不是很好的做法，但它的工作:)谁在乎

http://amitsavm.blogspot.in/2015/02/partially-parsing-xml-using-jaxb-by.html http://amitsavm.blogspot.in/2015/02/partially-parsing-xml-using-jaxb-by.html

Other wise use STAX or SAX or what Blaise Doughan is saying is also good and you can say a standard way, But if you have complex XML structure and you don't want to annotate your classes manually and use XJC tool. 其他明智的使用STAX或SAX或Blaise Doughan所说的也很好，你可以说一种标准的方式，但是如果你有复杂的XML结构并且你不想手动注释你的类并使用XJC工具。

In this case this might be helpful. 在这种情况下，这可能会有所帮助。

Answer 6

SAX但您必须自己构建Export对象

JAXB - unmarshal OutOfMemory：Java堆空间

问题描述

6 个解决方案

解决方案1
10 已采纳 2011-11-01 16:27:33

解决方案2
5 2011-11-01 15:26:46

解决方案3
2 2012-11-17 09:37:30

解决方案4
1 2011-11-01 15:34:20

解决方案5
0 2015-02-15 15:36:20

解决方案6
0 2011-11-01 15:28:02

JAXB - unmarshal OutOfMemory：Java堆空间

问题描述

6 个解决方案

解决方案1 10 已采纳 2011-11-01 16:27:33

解决方案2 5 2011-11-01 15:26:46

解决方案3 2 2012-11-17 09:37:30

解决方案4 1 2011-11-01 15:34:20

解决方案5 0 2015-02-15 15:36:20

解决方案6 0 2011-11-01 15:28:02

解决方案1
10 已采纳 2011-11-01 16:27:33

解决方案2
5 2011-11-01 15:26:46

解决方案3
2 2012-11-17 09:37:30

解决方案4
1 2011-11-01 15:34:20

解决方案5
0 2015-02-15 15:36:20

解决方案6
0 2011-11-01 15:28:02