简体   繁体   English

我应该使用什么模式来使用SAX解析器?

[英]What pattern should I use for using a SAX parser?

<xml>
<Office prop1="prop1" prop2="prop2">
    <Version major="1" minor="0"/>
    <Label>MyObjectA</Label>
    <Active>No</Active>
</Office>
<Vehicle prop="prop">
    <Wheels>4</Wheels>
    <Brand>Honda</Brand>
    <Bought>No</Bought>
</Vehicle>
</xml>

My XML is in this format. 我的XML采用这种格式。 I am using a SAX parser to parse this file as the size of the xml file can be large. 我正在使用SAX解析器来解析此文件,因为xml文件的大小可能很大。

What pattern should I follow to parse the file. 我应该遵循什么模式来解析文件。

Usually I have been following this approach: 通常我一直在遵循这种方法:

//PseudoCode
if(start){
    if(type Office)
    {
       create an instance of type Office and populate the attributes of Office in the Office class using a call back
    }
    if(type Vehicle)
    {
       create an instance of type Vehicle and populate the attributes of Vehicle in the Vehicle class using a call back
     }
}

if(end){
     // do cleaning up
}

This approach usually makes my parsing function containing start and end tag to be huge. 这种方法通常使我的解析函数包含起始和结束标记。 Is there any other better approach which can be followed. 还有其他更好的方法可以遵循。

I had good experience with this approach: 我对这种方法有很好的经验:

  1. Create lookup table to map node names to handler functions. 创建查找表以将节点名称映射到处理程序函数。 You'll most likely need to maintain two handlers per node name, one for the beginning and one for the end tag. 您很可能需要为每个节点名称维护两个处理程序,一个用于开头,一个用于结束标记。
  2. Maintain a stack of the parent nodes. 维护一组父节点。
  3. Call the handler from the lookup table. 从查找表中调用处理程序。
  4. Each handler function can do its tasks without further checks. 每个处理函数都可以执行其任务而无需进一步检查 But if necessary each handler can also determine the current context by looking at the parent node stack. 但是如果需要,每个处理程序也可以通过查看父节点堆栈来确定当前上下文。 That becomes important if you have nodes with the same name at different places in the node hierarchy. 如果节点层次结构中的不同位置具有相同名称的节点,则这一点很重要。

Some pseudo-Java code: 一些伪Java代码:

public class MyHandler extends DefaultHandler {

private Map<String, MyCallbackAdapter> startLookup = new HashMap<String, MyCallbackAdapter>();
private Map<String, MyCallbackAdapter> endLookup = new HashMap<String, MyCallbackAdapter>();
private Stack<String> nodeStack = new Stack<String>();

public MyHandler() {
   // Initialize the lookup tables
   startLookup.put("Office", new MyCallbackAdapter() { 
      public void execute() { myOfficeStart() } 
    });

   endLookup.put("Office", new MyCallbackAdapter() { 
      public void execute() { myOfficeEnd() } 
    });
}

public void startElement(String namespaceURI, String localName,
        String qName, Attributes atts) {
  nodeStack.push(localName);

  MyCallbackAdapter callback = startLookup.get(localName);
  if (callback != null)
    callback.execute();
}

public void endElement(String namespaceURI, String localName, String qName)

  MyCallbackAdapter callback = endLookup.get(localName);
  if (callback != null)
    callback.execute();

  nodeStack.pop();
}

private void myOfficeStart() {
  // Do the stuff necessary for the "Office" start tag
}

private void myOfficeEnd() {
  // Do the stuff necessary for the "Office" end tag
}

//...

} }

General advice: Depending on your requirements you might need further contextual information, like the previous node name or if the current node is empty. 一般建议:根据您的要求,您可能需要更多上下文信息,例如上一个节点名称或当前节点为空。 If you find yourself adding more and more contextual information, you might consider switching to a full fletched DOM parser, unless runtime speed is more important than developing speed. 如果您发现自己添加了越来越多的上下文信息,您可能会考虑切换到完整的fletched DOM解析器,除非运行时速度比开发速度更重要。

If you want to stick with the explicit SAX approach, DR's answer makes sense. 如果你想坚持使用明确的SAX方法, DR的答案是有道理的。 I've used this approach in the past with success. 我过去使用这种方法取得了成功。

However you may want to take a look at Commons Digester , which allows you to specify an object to be created/populated for subtrees of an XML document. 但是,您可能需要查看Commons Digester ,它允许您指定要为XML文档的子树创建/填充的对象。 It's a very easy way to build an object hierarchy from XML without using the SAX model explicitly. 这是一种非常简单的方法,可以在不使用SAX模型的情况下从XML构建对象层次结构。

See this ONJava article for more info. 有关详细信息,请参阅此ONJava文章。

您可以从类型到解析操作创建查找表,然后您只需要索引到查找表以查找适当的解析操作。

你需要一个词汇analyer ,该解释器模式是写一个词法分析器理想图案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM