简体   繁体   English

XML解析如何在SAX解析器内部工作?

[英]How XML parsing works Inside SAX Parser?

I'm trying to parse XML using SAX. 我正在尝试使用SAX解析XML。 Below is a code snippet: 下面是一个代码片段:

public class ReadXML {

   public static void main(String argv[]) {

    try {

    SAXParserFactory factory = SAXParserFactory.newInstance();
    SAXParser saxParser = factory.newSAXParser();

    DefaultHandler handler = new DefaultHandler() {

    boolean bfname = false;
    boolean blname = false;
    boolean bnname = false;
    boolean bsalary = false;

    public void startElement(String uri, String localName,String qName, 
                Attributes attributes) throws SAXException {

        System.out.println("Parameters :" + uri +":"+ localName +":"+ qName +":"+ attributes);
        System.out.println("Start Element :" + qName);

        if (qName.equalsIgnoreCase("FIRSTNAME")) {
            bfname = true;
        }

        if (qName.equalsIgnoreCase("LASTNAME")) {
            blname = true;
        }

        if (qName.equalsIgnoreCase("NICKNAME")) {
            bnname = true;
        }

        if (qName.equalsIgnoreCase("SALARY")) {
            bsalary = true;
        }

    }

    public void endElement(String uri, String localName,
        String qName) throws SAXException {

        System.out.println("End Element :" + qName);

    }

    public void characters(char[] ch, int start, int length) throws SAXException {

        System.out.println("Im here:"+Arrays.toString(ch));
        if (bfname) {
            System.out.println("First Name : " + new String(ch, start, length));
            bfname = false;
        }

        if (blname) {
            System.out.println("Last Name : " + new String(ch, start, length));
            blname = false;
        }

        if (bnname) {
            System.out.println("Nick Name : " + new String(ch, start, length));
            bnname = false;
        }

        if (bsalary) {
            System.out.println("Salary : " + new String(ch, start, length));
            bsalary = false;
        }

    }

     };

       saxParser.parse("C:\\Lenny\\Work\\XML\\SaxParsing_01.xml", handler); --(1)


     } catch (Exception e) {
       e.printStackTrace();
     }

   }

}

My first question is, When the code reaches at saxParser.parse("C:\\\\Ashish\\\\Work\\\\XML\\\\SaxParsing_01.xml", handler); 我的第一个问题是,代码到达saxParser.parse("C:\\\\Ashish\\\\Work\\\\XML\\\\SaxParsing_01.xml", handler); , below two methods gets called..! ,下面的两个方法被调用..!

    public void parse(File f, HandlerBase hb)
            throws SAXException, IOException {
            if (f == null) {
                throw new IllegalArgumentException("File cannot be null");
            }

            String escapedURI = FilePathToURI.filepath2URI(f.getAbsolutePath());
            if (DEBUG) {
                System.out.println("Escaped URI = " + escapedURI);
            }
            InputSource input = new InputSource(escapedURI);
            this.parse(input, hb);
        }

public void parse(InputSource is, DefaultHandler dh)
        throws SAXException, IOException {
        if (is == null) {
            throw new IllegalArgumentException("InputSource cannot be null");
        }
        XMLReader reader = this.getXMLReader();
        if (dh != null) {
            reader.setContentHandler(dh);
            reader.setEntityResolver(dh);
            reader.setErrorHandler(dh);
            reader.setDTDHandler(dh);
        }
        reader.parse(is);
    }

Am curious to know, What happens inside when reader.parse(is) is called ? 想知道,当reader.parse(is)时会发生什么? The only thing I'm assuming is, reader is reading XML and putting into DefautHandler's data structure created in above code and producing output accordingly. 我唯一假设的是, reader正在读取XML,并将其放入上述代码中创建的DefautHandler的数据结构中,并相应地产生输出。

I've tried alot to find out the source code of parse(is) method, but couldn't find it. 我尝试了很多方法来找出parse(is)方法的源代码,但找不到它。 In SAXParser class, parse is an abstract method, so not able to find the implementation class where I could check the source code an understand further. 在SAXParser类中,parse是一种抽象方法,因此无法在实现类中找到可以检查源代码并进一步理解的地方。

Second but silly question, May I know please, when we'r creating the DefautHandler instance, are methods inside that block are overridden ? 第二个愚蠢的问题,当我们创建DefautHandler实例时,请问我是否知道该块内的方法是否被覆盖? And In constructor's block, Are we allowed to create variables like we have created four Boolean variables ? 在构造函数的块中,是否允许像创建四个布尔变量一样创建变量? Never saw this kind of approach in Java. 在Java中从未见过这种方法。

Can anyone help me for the same ....? 有人可以帮我做同样的事情吗...?

Thanks 谢谢

SAXParser is an interface and there are many XML parsers that implement this interface. SAXParser是一个接口,并且有许多实现该接口的XML解析器。 If you want to know how it works you will need to choose one of these XML parsers - the most accessible is the Apache Xerces parser. 如果您想知道它是如何工作的,则需要选择以下XML解析器之一-最易于访问的是Apache Xerces解析器。 You could start here, but be warned that it's not easy reading: 您可以从这里开始,但要警告它不容易阅读:

https://apache.googlesource.com/xerces2-j/+/f59f47412e404f4984480a45a99957ac07d287d4/src/org/apache/xerces/parsers/AbstractSAXParser.java https://apache.googlesource.com/xerces2-j/+/f59f47412e404f4984480a45a99957ac07d287d4/src/org/apache/xerces/parsers/AbstractSAXParser.java

In very simple terms, the parser is going to look for a "<", and when it finds one, it's going to call the supplied ContentHandler 's startElement() method with appropriate parameters. 简单来说,解析器将查找“ <”,当找到解析器时,它将使用适当的参数调用提供的ContentHandlerstartElement()方法。

You don't actually need to understand how it works internally in order to successfully make use of a SAX parser, though well done for trying. 尽管可以做得很好,但实际上不需要了解它如何在内部工作以成功利用SAX解析器。

You're right that writing a SAX ContentHandler (perhaps as an extension of DefaultHandler) involves a rather different style of Java programming than you may be used to. 没错,编写SAX ContentHandler(也许是DefaultHandler的扩展)所涉及的Java编程风格与您惯用的风格截然不同。 Because your code is processing events through callbacks, you can't maintain the current state on the stack in the way that you would if you owned the main control loop. 因为您的代码正在通过回调处理事件,所以您无法像拥有主控制循环那样维护堆栈上的当前状态。 Rather you have to think how each call on a method such as startElement() or characters() affects the current state that your application needs to maintain, and work out how to modify the data structure that holds this state. 相反,您必须考虑对诸如startElement()或character()之类的方法的每次调用如何影响应用程序需要维护的当前状态,并弄清楚如何修改保存该状态的数据结构。 It's a rather different way of programming, and is one of the reasons why some people say that "pull" parsing interfaces are easier to use than "push" interfaces. 这是一种完全不同的编程方式,也是有人说“拉”解析接口比“推”接口更易于使用的原因之一。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM