I'm currently trying to use JAXB to unmarshal an XML file, but it seems that the XML file is too large (~500mb) for the unmarshaller to handle. I keep getting java.lang.OutOfMemoryError: Java heap space
@
Unmarshaller um = JAXBContext.newInstance("com.sample.xml");
Export e = (Export)um.unmarhsal(new File("SAMPLE.XML"));
I'm guessing this is becuase it's trying to open the large XML file as an object, but the file is just too large for the java heap space.
Is there any other more 'memory efficient' method of parsing large XML files ~ 500mb? Or perhaps an unmarshaller property that may help me handle the large XML file?
Here's what my XML looks like
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- -->
<Export xmlns="wwww.foo.com" xmlns:xsi="www.foo1.com" xsi:schemaLocation="www.foo2.com/.xsd">
<!--- --->
<Origin ID="foooo" />
<!---- ---->
<WorkSets>
<WorkSet>
<Work>
.....
<Work>
....
<Work>
.....
</WorkSet>
<WorkSet>
....
</WorkSet>
</WorkSets>
I'd like to unmarshal at the WorkSet level, still being able to read through all of the work for each WorkSet.
What does your XML look like? Typically for large documents I recommend people leverage a StAX XMLStreamReader so that the document can be unmarshalled by JAXB in chunks.
input.xml
In the document below there are many instances of the person
element. We can use JAXB with a StAX XMLStreamReader
to unmarshal the corresponding Person
objects one at a time to avoid running out of memory.
<people>
<person>
<name>Jane Doe</name>
<address>
...
</address>
</person>
<person>
<name>John Smith</name>
<address>
...
</address>
</person>
....
</people>
Demo
import java.io.*;
import javax.xml.stream.*;
import javax.xml.bind.*;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
xsr.nextTag(); // Advance to statements element
JAXBContext jc = JAXBContext.newInstance(Person.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
Person person = (Person) unmarshaller.unmarshal(xsr);
}
}
}
Person
Instead of matching on the root element of the XML document we need to add @XmlRootElement
annotations on the local root of the XML fragment that we will be unmarshalling from.
@XmlRootElement
public class Person {
}
You could increase the heap space using the -Xmx
startup argument.
For large files, SAX processing is more memory-efficient since it's event driven, and doesn't load the entire structure in to memory.
I've been doing a lot of research in particular with regards to parsing very large input sets conveniently. It's true that you could combine StaX and JaxB to selectively parse XML fragments, but it's not always possible or preferable. If you're interested to read more on the topic please have a look at:
http://xml2java.net/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf
In this document I describe an alternative approach that is very straight forward and convenient to use. It parses arbitrarily large input sets, whilst giving you access to your data in a javabeans fashion.
You can try this too this is kind of not good practice but its working :) who cares
http://amitsavm.blogspot.in/2015/02/partially-parsing-xml-using-jaxb-by.html
Other wise use STAX or SAX or what Blaise Doughan is saying is also good and you can say a standard way, But if you have complex XML structure and you don't want to annotate your classes manually and use XJC tool.
In this case this might be helpful.
SAX但您必须自己构建Export对象
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.