简体   繁体   English

使用Sax解析大型XML文件

[英]Parsing Large XML File Using Sax

i am trying to parse an xml document, after searching i found out that sax is the best choice, but the document is very large (1.5 GB) waited like 7 minutes but its still running my question is, is that normal ? 我试图解析一个xml文档,搜索后我发现sax是最好的选择,但是文档很大(1.5 GB)等了7分钟,但它仍然在运行,我的问题是,这正常吗? or i can do better ? 还是我可以做得更好?

public static void main(String argv[]) {

    try {

        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser saxParser = factory.newSAXParser();

        DefaultHandler handler = new DefaultHandler() {

            int c = 0;
            boolean id = false;
            boolean value = false;
            boolean orgin = false;
            boolean note = false;

            @Override
            public void startElement(String uri, String localName, String eName,
                    Attributes attributes) throws SAXException {

                if (eName.equalsIgnoreCase("ID")) {
                    id = true;
                }

                if (eName.equalsIgnoreCase("VALUE")) {
                    value = true;
                }

                if (eName.equalsIgnoreCase("ORGIN")) {
                    orgin = true;
                }

                if (eName.equalsIgnoreCase("NOTE")) {
                    note = true;
                }

            }

            @Override
            public void endElement(String uri, String localName,
                    String eName) throws SAXException {

            }

            @Override
            public void characters(char ch[], int start, int length) throws SAXException {

                if (id) {
                    System.out.println(new String(ch, start, length));
                    id = false;
                    System.out.println("record num : "+c++);
                }

                if (value) {
                    System.out.println(new String(ch, start, length));
                    value = false;
                }

                if (orgin) {
                    System.out.println(new String(ch, start, length));
                    orgin = false;
                }

                if (note) {
                    System.out.println(new String(ch, start, length));
                    note = false;
                }

            }

        };

        saxParser.parse("./transactions.xml", handler);

    } catch (Exception e) {
        e.printStackTrace();
    }

}
  1. You can save some time by changing equalsIgnoreCase to equals (unless you really encounter "ValuE" and "valUE" and "VaLuE" ...) 您可以通过将equalsIgnoreCase更改为equals来节省一些时间(除非您真的遇到过“ ValuE”,“ valUE”和“ VaLuE” ...)
  2. The printing is probably taking most of the time. 大部分时间可能需要打印。 IO operations are usually the bottleneck IO操作通常是瓶颈

If you parse such a huge file you should use Stax instead of Sax. 如果解析如此大的文件,则应使用Stax而不是Sax。 With Stax you can skip parts of your file which makes it faster and it's faster. 使用Stax,您可以跳过文件的某些部分,这将使其变得越来越快。

StAX is a "pull" type of API. StAX是一种“拉”型API。 As discussed, there are Cursor and Event Iterator APIs. 如上所述,有Cursor和Event Iterator API。 There are both reading and writing sides of the API. 该API有读写方面。 It is more developer friendly than SAX. 它比SAX对开发人员更友好。 StAX, like SAX, does not require an entire document to be held in memory. 与SAX一样,StAX不需要将整个文档保存在内存中。 However, unlike SAX, an entire document need not be read. 但是,与SAX不同,不需要读取整个文档。 Portions can be skipped. 可以跳过部分。 This may result in even improved performance over SAX. 与SAX相比,这甚至可以提高性能。

( DOM vs SAX XML parsing for large files ) 针对大型文件的DOM与SAX XML解析

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM