简体   繁体   English

使用SAX解析常见的XML元素

[英]Using SAX to parse common XML elements

I'm currently using SAX (Java) to parse aa handful of different XML documents, with each document representing different data and having slightly different structures. 我目前正在使用SAX(Java)来解析一些不同的XML文档,每个文档代表不同的数据并且结构略有不同。 For this reason, each XML document is handled by a different SAX class (subclassing DefaultHandler ). 因此,每个XML文档都由不同的SAX类(子类化DefaultHandler )处理。

However, there are some XML structures that can appear in all these different documents. 但是,有一些XML结构可以出现在所有这些不同的文档中。 Ideally, I'd like to tell the parser "Hey, when you reach a complex_node element, just use ComplexNodeHandler to read it, and give me back the result. If you reach a some_other_node , use OtherNodeHandler to read it and give me back that result". 理想情况下,我想告诉解析器“嘿,当你到达complex_node元素时,只需使用ComplexNodeHandler来读取它,然后给我回复结果。如果你到达some_other_node ,请使用OtherNodeHandler来读取它然后给我回复结果”。

However, I can't see an obvious way to do this. 但是,我看不出一个明显的方法来做到这一点。

Should I simply just make a monolithic handler class that can read all the different documents I have (and eradicate duplication of code), or is there a smarter way to handle this? 我应该只是创建一个单片处理程序类,它可以读取我拥有的所有不同文档(并根除代码重复),还是有更聪明的方法来处理它?

Below is an answer I made to a similar question ( Skipping nodes with sax ). 下面是我对类似问题的回答( 使用sax跳过节点 )。 It demonstrates how to swap content handlers on an XMLReader. 它演示了如何在XMLReader上交换内容处理程序。

In this example the swapped in ContentHandler simply ignores all events until it gives up control, but you could adapt the concept easily. 在此示例中,交换的ContentHandler只是忽略所有事件,直到它放弃控制,但您可以轻松地调整概念。


You could do something like the following: 您可以执行以下操作:

import javax.xml.parsers.SAXParser; 
import javax.xml.parsers.SAXParserFactory; 
import org.xml.sax.XMLReader; 

public class Demo { 

    public static void main(String[] args) throws Exception { 
        SAXParserFactory spf = SAXParserFactory.newInstance(); 
        SAXParser sp = spf.newSAXParser(); 
        XMLReader xr = sp.getXMLReader(); 
        xr.setContentHandler(new MyContentHandler(xr)); 
        xr.parse("input.xml"); 
    } 
} 

MyContentHandler MyContentHandler

This class is responsible for processing your XML document. 该类负责处理XML文档。 When you hit a node you want to ignore you can swap in the IgnoringContentHandler which will swallow all events for that node. 当您点击要忽略的节点时,可以交换IgnoringContentHandler,它将吞下该节点的所有事件。

import org.xml.sax.Attributes; 
import org.xml.sax.ContentHandler; 
import org.xml.sax.Locator; 
import org.xml.sax.SAXException; 
import org.xml.sax.XMLReader; 

public class MyContentHandler implements ContentHandler { 

    private XMLReader xmlReader; 

    public MyContentHandler(XMLReader xmlReader) { 
        this.xmlReader = xmlReader; 
    } 

    public void setDocumentLocator(Locator locator) { 
    } 

    public void startDocument() throws SAXException { 
    } 

    public void endDocument() throws SAXException { 
    } 

    public void startPrefixMapping(String prefix, String uri) 
            throws SAXException { 
    } 

    public void endPrefixMapping(String prefix) throws SAXException { 
    } 

    public void startElement(String uri, String localName, String qName, 
            Attributes atts) throws SAXException { 
        if("sodium".equals(qName)) { 
            xmlReader.setContentHandler(new IgnoringContentHandler(xmlReader, this)); 
        } else { 
            System.out.println("START " + qName); 
        } 
    } 

    public void endElement(String uri, String localName, String qName) 
            throws SAXException { 
        System.out.println("END " + qName); 
    } 

    public void characters(char[] ch, int start, int length) 
            throws SAXException { 
        System.out.println(new String(ch, start, length)); 
    } 

    public void ignorableWhitespace(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void processingInstruction(String target, String data) 
            throws SAXException { 
    } 

    public void skippedEntity(String name) throws SAXException { 
    } 

} 

IgnoringContentHandler IgnoringContentHandler

When the IgnoringContentHandler is done swallowing events it passes control back to your main ContentHandler. 当IgnoringContentHandler完成吞咽事件时,它会将控制权传递给您的主ContentHandler。

import org.xml.sax.Attributes; 
import org.xml.sax.ContentHandler; 
import org.xml.sax.Locator; 
import org.xml.sax.SAXException; 
import org.xml.sax.XMLReader; 

public class IgnoringContentHandler implements ContentHandler { 

    private int depth = 1; 
    private XMLReader xmlReader; 
    private ContentHandler contentHandler; 

    public IgnoringContentHandler(XMLReader xmlReader, ContentHandler contentHandler) { 
        this.contentHandler = contentHandler; 
        this.xmlReader = xmlReader; 
    } 

    public void setDocumentLocator(Locator locator) { 
    } 

    public void startDocument() throws SAXException { 
    } 

    public void endDocument() throws SAXException { 
    } 

    public void startPrefixMapping(String prefix, String uri) 
            throws SAXException { 
    } 

    public void endPrefixMapping(String prefix) throws SAXException { 
    } 

    public void startElement(String uri, String localName, String qName, 
            Attributes atts) throws SAXException { 
        depth++; 
    } 

    public void endElement(String uri, String localName, String qName) 
            throws SAXException { 
        depth--; 
        if(0 == depth) { 
           xmlReader.setContentHandler(contentHandler); 
        } 
    } 

    public void characters(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void ignorableWhitespace(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void processingInstruction(String target, String data) 
            throws SAXException { 
    } 

    public void skippedEntity(String name) throws SAXException { 
    } 

} 

You could have one handler (ComplexNodeHandler) that handles only some parts of a document (complex_node) and passes all other pieces to another handler. 您可以使用一个处理程序(ComplexNodeHandler)来处理文档的某些部分(complex_node),并将所有其他部分传递给另一个处理程序。 The constructor for ComplexNodeHandler would take the other handler as a parameter. ComplexNodeHandler的构造函数将另一个处理程序作为参数。 I mean something like this: 我的意思是这样的:

class ComplexNodeHandler {

    private ContentHandler handlerForOtherNodes;

    public ComplexNodeHandler(ContentHandler handlerForOtherNodes) {
         this.handlerForOtherNodes = handlerForOtherNodes;
    }

    ...

    public startElement(String uri, String localName, String qName, Attributes atts) {
        if (currently in complex node) {
            [handle complex node data] 
        } else {
            // pass the event to the document specific handler
            handlerForOtherNodes.startElement(uri, localName, qName, atts);
       }
    } 

    ...

}

There could be better alternatives still since I'm not that familiar with SAX. 因为我不熟悉SAX,所以还有更好的选择。 Writing a base handler for the common parts and inheriting it could work too but I'm not sure if using inheritance here is a good idea. 为公共部分编写基础处理程序并继承它也可以工作,但我不确定在这里使用继承是一个好主意。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM