Java解析器测试

Question

i am testing efficiency of DOM, SAX and StAX. 我正在测试DOM，SAX和StAX的效率。

Basically what i do is that i use spring stopwatch and different sizes of XML and then compare results. 基本上我所做的是，我使用弹簧秒表和不同大小的XML，然后比较结果。

I also thought that i could measure time while elements are loaded to objects and objects to array, but that has nothing to do with parsring. 我还认为我可以在元素加载到对象和对象到数组时测量时间，但这与分析无关。

here are my codes for SAX 这是我的SAX代码

  StopWatch stopWatch = new StopWatch("SAX");
  stopWatch.start("SAX");  
  SAXParserFactory spf = SAXParserFactory.newInstance();
  spf.setValidating(false);
  SAXParser sp = spf.newSAXParser();
  XMLReader parser = sp.getXMLReader();
  parser.setErrorHandler(new Chyby());
  parser.setContentHandler(new DefaultHandler());
  parser.parse(file);
 stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

for StAX 对于StAX

  int temp = 0;
  StopWatch stopWatch = new StopWatch("StAX");
  stopWatch.start("StAX");    
  XMLInputFactory f = XMLInputFactory.newInstance();
  XMLStreamReader r = f.createXMLStreamReader( new FileInputStream( file ));   
    while (r.hasNext()==true){
    temp++;
    r.next();
    }
     System.out.println("parsed");
  stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

DOM DOM

StopWatch stopWatch = new StopWatch("DOM");
stopWatch.start("DOM");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(subor);
System.out.println("parsed");
System.out.println("----------------\n");
    stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

My question is: Am i doing it right? 我的问题是：我做得对吗？ is there other approach for testing parsers? 还有其他测试解析器的方法吗？ Thanks 谢谢

Answer 1

Creating JAXP factory classes is a very expensive operation, and its cost depends highly on what JARs are present on the classpath. 创建JAXP工厂类是一项非常昂贵的操作，其成本在很大程度上取决于类路径中存在的JAR。 You don't really want to measure that. 你真的不想衡量它。

You need to take care to eliminate Java start-up costs. 您需要注意消除Java启动成本。 Parse a few documents before you start measuring. 在开始测量之前解析一些文档。 Run the measurements repeatedly, average the results, and check that the results are consistent. 重复运行测量，平均结果，并检查结果是否一致。

I would run the test with documents of different sizes. 我会用不同大小的文件进行测试。 Typically the cost will be (ax+b) where x is the document size. 通常，成本将是（ax + b），其中x是文档大小。 The figure 'b' here represents the "per-document overhead" and can be quite significant if the documents are small. 这里的数字'b'代表“每个文档的开销”，如果文档很小，则可能非常重要。

In the case of DOM there may well be garbage collections occurring which can distort the results because they happen at unpredictable times. 在DOM的情况下，可能会发生垃圾收集，这会导致结果失真，因为它们在不可预测的时间发生。 Forcing garbage collection at known times is sometimes recommended to get consistent measurements. 有时建议在已知时间强制进行垃圾收集以获得一致的测量结果。

Answer 2

You may want to factor the creation of the factories out of the performance run or measure them separately. 您可能希望将工厂的创建考虑在性能运行之外，或者单独测量它们。 You will probably want to touch all the data to prevent a parser from falsely looking good of it lazily builds objects. 您可能希望触摸所有数据以防止解析器错误地看起来好懒得构建对象。

Java解析器测试

问题描述

2 个解决方案

解决方案1
2 2013-05-02 06:51:05

解决方案2
1 已采纳 2013-05-02 00:39:21

Java解析器测试

问题描述

2 个解决方案

解决方案1 2 2013-05-02 06:51:05

解决方案2 1 已采纳 2013-05-02 00:39:21

解决方案1
2 2013-05-02 06:51:05

解决方案2
1 已采纳 2013-05-02 00:39:21