JAXB - unmarshal OutOfMemory: Java Heap Space

Question

I'm currently trying to use JAXB to unmarshal an XML file, but it seems that the XML file is too large (~500mb) for the unmarshaller to handle. I keep getting java.lang.OutOfMemoryError: Java heap space @

Unmarshaller um = JAXBContext.newInstance("com.sample.xml");
Export e = (Export)um.unmarhsal(new File("SAMPLE.XML"));

I'm guessing this is becuase it's trying to open the large XML file as an object, but the file is just too large for the java heap space.

Is there any other more 'memory efficient' method of parsing large XML files ~ 500mb? Or perhaps an unmarshaller property that may help me handle the large XML file?

Here's what my XML looks like

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- -->
<Export xmlns="wwww.foo.com" xmlns:xsi="www.foo1.com" xsi:schemaLocation="www.foo2.com/.xsd">
<!--- --->
<Origin ID="foooo" />
<!---- ---->
<WorkSets>
   <WorkSet>
      <Work>
         .....
      <Work>
         ....
      <Work>
      .....
   </WorkSet>
   <WorkSet>
      ....
   </WorkSet>
</WorkSets>

I'd like to unmarshal at the WorkSet level, still being able to read through all of the work for each WorkSet.

Answer 1

What does your XML look like? Typically for large documents I recommend people leverage a StAX XMLStreamReader so that the document can be unmarshalled by JAXB in chunks.

input.xml

In the document below there are many instances of the person element. We can use JAXB with a StAX XMLStreamReader to unmarshal the corresponding Person objects one at a time to avoid running out of memory.

<people>
   <person>
       <name>Jane Doe</name>
       <address>
           ...
       </address>
   </person>
   <person>
       <name>John Smith</name>
       <address>
           ...
       </address>
   </person>
   ....
</people>

Demo

import java.io.*;
import javax.xml.stream.*;
import javax.xml.bind.*;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        JAXBContext jc = JAXBContext.newInstance(Person.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();
        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            Person person = (Person) unmarshaller.unmarshal(xsr);
        }
    }

}

Person

Instead of matching on the root element of the XML document we need to add @XmlRootElement annotations on the local root of the XML fragment that we will be unmarshalling from.

@XmlRootElement
public class Person {
}

Answer 2

You could increase the heap space using the -Xmx startup argument.

For large files, SAX processing is more memory-efficient since it's event driven, and doesn't load the entire structure in to memory.

Answer 3

I've been doing a lot of research in particular with regards to parsing very large input sets conveniently. It's true that you could combine StaX and JaxB to selectively parse XML fragments, but it's not always possible or preferable. If you're interested to read more on the topic please have a look at:

http://xml2java.net/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf

In this document I describe an alternative approach that is very straight forward and convenient to use. It parses arbitrarily large input sets, whilst giving you access to your data in a javabeans fashion.

Answer 4

Use SAX or StAX . But if the goal is to have an in-memory object representation of the file, you'll still need lots of memory to hold the contents of such a big file. In this case, your only hope is to increase the heap size using the -Xmx1024m JVM option (which sets the max heap size to 1024 MBs)

Answer 5

You can try this too this is kind of not good practice but its working :) who cares

http://amitsavm.blogspot.in/2015/02/partially-parsing-xml-using-jaxb-by.html

Other wise use STAX or SAX or what Blaise Doughan is saying is also good and you can say a standard way, But if you have complex XML structure and you don't want to annotate your classes manually and use XJC tool.

In this case this might be helpful.

Answer 6

SAX但您必须自己构建Export对象

JAXB - unmarshal OutOfMemory: Java Heap Space

Question

6 answers

solution1
10 ACCPTED 2011-11-01 16:27:33

solution2
5 2011-11-01 15:26:46

solution3
2 2012-11-17 09:37:30

solution4
1 2011-11-01 15:34:20

solution5
0 2015-02-15 15:36:20

solution6
0 2011-11-01 15:28:02

JAXB - unmarshal OutOfMemory: Java Heap Space

Question

6 answers

solution1 10 ACCPTED 2011-11-01 16:27:33

solution2 5 2011-11-01 15:26:46

solution3 2 2012-11-17 09:37:30

solution4 1 2011-11-01 15:34:20

solution5 0 2015-02-15 15:36:20

solution6 0 2011-11-01 15:28:02

solution1
10 ACCPTED 2011-11-01 16:27:33

solution2
5 2011-11-01 15:26:46

solution3
2 2012-11-17 09:37:30

solution4
1 2011-11-01 15:34:20

solution5
0 2015-02-15 15:36:20

solution6
0 2011-11-01 15:28:02