简体   繁体   中英

Java parsers testing

i am testing efficiency of DOM, SAX and StAX.

Basically what i do is that i use spring stopwatch and different sizes of XML and then compare results.

I also thought that i could measure time while elements are loaded to objects and objects to array, but that has nothing to do with parsring.

here are my codes for SAX

  StopWatch stopWatch = new StopWatch("SAX");
  stopWatch.start("SAX");  
  SAXParserFactory spf = SAXParserFactory.newInstance();
  spf.setValidating(false);
  SAXParser sp = spf.newSAXParser();
  XMLReader parser = sp.getXMLReader();
  parser.setErrorHandler(new Chyby());
  parser.setContentHandler(new DefaultHandler());
  parser.parse(file);
 stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

for StAX

  int temp = 0;
  StopWatch stopWatch = new StopWatch("StAX");
  stopWatch.start("StAX");    
  XMLInputFactory f = XMLInputFactory.newInstance();
  XMLStreamReader r = f.createXMLStreamReader( new FileInputStream( file ));   
    while (r.hasNext()==true){
    temp++;
    r.next();
    }
     System.out.println("parsed");
  stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

DOM

StopWatch stopWatch = new StopWatch("DOM");
stopWatch.start("DOM");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(subor);
System.out.println("parsed");
System.out.println("----------------\n");
    stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

My question is: Am i doing it right? is there other approach for testing parsers? Thanks

Creating JAXP factory classes is a very expensive operation, and its cost depends highly on what JARs are present on the classpath. You don't really want to measure that.

You need to take care to eliminate Java start-up costs. Parse a few documents before you start measuring. Run the measurements repeatedly, average the results, and check that the results are consistent.

I would run the test with documents of different sizes. Typically the cost will be (ax+b) where x is the document size. The figure 'b' here represents the "per-document overhead" and can be quite significant if the documents are small.

In the case of DOM there may well be garbage collections occurring which can distort the results because they happen at unpredictable times. Forcing garbage collection at known times is sometimes recommended to get consistent measurements.

You may want to factor the creation of the factories out of the performance run or measure them separately. You will probably want to touch all the data to prevent a parser from falsely looking good of it lazily builds objects.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM