简体   繁体   English

是否可以使用SAX Parser解析大小为800 MB的大型xml文件

[英]Is It possible to parse large xml file which has size 800 MB using SAX Parser

I am parsing transxchange data which has some files of very large size nearly 800 MB . 我正在解析transxchange数据,其中包含一些非常大的文件(近800 MB) when I am trying to parse these files I am getting following error. 当我尝试解析这些文件时,出现以下错误。

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
=======================================================================
    at java.util.ArrayList.<init>(Unknown Source)
    at java.util.ArrayList.<init>(Unknown Source)
    at JourneyPatternSections.<init>(JourneyPatternSections.java:21)
    at ReadBusData.startElement(ReadBusData.java:131)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(Unknown Source)
    at ReadBusData.parseDocument(ReadBusData.java:51)
    at ReadBusData.<init>(ReadBusData.java:41)
    at ReadBusData.main(ReadBusData.java:218).

I am following this Tutorial. 我正在关注教程。
can Anybody help me. 有谁能够帮助我。

Q: Is It possible to parse large xml file which has size 800 MB using SAX Parser? 问:是否可以使用SAX Parser解析大小为800 MB的大型xml文件?

A: Yes, of course! 答:当然可以!

The problem isn't SAX. 问题不在于SAX。 SAX is actually an ideal choice for handling large files. 实际上,SAX是处理大型文件的理想选择。

The problem clearly occurred with your arraylist. 您的arraylist显然出现了问题。

How big is it? 它有多大?

How big are other structures? 其他结构有多大?

Do you actually need to store all the data you're allocating space for? 您实际上是否需要存储要为其分配空间的所有数据?

Are you running your program with any VM flags to allocate more memory? 您是否正在使用任何VM标志运行程序以分配更多内存?

How much memory does your PC have? 您的电脑有多少内存? Can you run it on a PC that supports more memory? 可以在支持更多内存的PC上运行它吗? A 64-bit PC? 一台64位PC?

Are you using a 64-bit JVM? 您是否正在使用64位JVM?

SUGGESTION: Download and try out Visual VM to troubleshoot the problem at your code level: 建议:下载并尝试使用Visual VM在代码级别对问题进行故障排除:

You'll probably find that you're allocating far more data than you intended to. 您可能会发现分配的数据远远超出了预期。

IMHO... 恕我直言...

Increase your heap size, eg, launch the VM with -Xmx1g . 增加堆大小,例如,使用-Xmx1g启动VM。

See this blog. 请参阅此博客。

SAX is going to be your best mechanism for parsing a large file. SAX将成为解析大型文件的最佳方法。 DOM parsing will load the entire document into memory and you'll run into problems. DOM解析会将整个文档加载到内存中,您会遇到问题。 Chances are you are having issues because you are trying to store everything in a collection of some sort. 您可能会遇到问题,因为您试图将所有内容存储在某种形式的集合中。 SAX is great for parsing the xml, dealing with it, and moving on. SAX非常适合解析xml,处理它并继续前进。

The error is occurring in creating a data structure you are creating. 该错误是发生在创造时,我们正在创建的数据结构。 You need to either reduce how much memory you are using or increase the amount of memory your program has. 您需要减少正在使用的内存量或增加程序所拥有的内存量。

One GB isn't that these days. 这些天不是1 GB。 If you can give it 4 to 16 GB this will make processing the file much simpler. 如果您可以给它4到16 GB的空间,这将使文件处理变得更加简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM