简体   繁体   English

使用SAX解析XML文件时,如何保留注释和XML声明?

[英]How to preserve comments and the XML-Declaration when parsing XML-files using SAX?

I have a simple task: 我有一个简单的任务:

I'd like to read an XML-files and return it as completely as possible. 我想读取XML文件并尽可能完整地返回它。 With the following code there are two remaining problems: 使用以下代码,剩下两个问题:

  1. Comments are removed 评论已删除
  2. I have no access to the XML-Declaration 我无权访问XML声明

Java Code: Java代码:

package com.stackoverflow.tests;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class XmlParsing {

  public static void main(String[] args) {

    StringBuffer b = new StringBuffer();

    try {

      SAXParserFactory factory = SAXParserFactory.newInstance();
      SAXParser saxParser = factory.newSAXParser();

      DefaultHandler handler = new DefaultHandler() {

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes)
            throws SAXException {

          b.append("<" + qName + attributesToString(attributes) + ">");
        } // END: startElement()



        @Override
        public void endElement(String uri, String localName, String qName)
            throws SAXException {

          b.append("</" + qName + ">");
        } // END: endElement



        @Override
        public void characters(char ch[], int start, int length)
            throws SAXException {

          b.append(new String(ch, start, length));

        } // END: characters()



      }; // END: DefaultHandler

      saxParser.parse("./src/main/ressources/XmlTest/validWithAttributesCommentsInlineElements.xml", handler);

      System.out.println(b.toString());

    } catch (Exception e) {
      e.printStackTrace();

    } // END: try

  } // END: main



  public static String attributesToString(Attributes a) {
    StringBuffer sb = new StringBuffer();
    for(int i = 0; i < a.getLength(); i++) {
      sb
        .append(" ")
        .append(a.getQName(i))
        .append("=\"")
        .append(a.getValue(i))
        .append("\"");
    }
    return sb.toString();
  }



} // END: Class XmlParsing

I parse the follwoing XML-file...: 我解析以下XML文件...:

<?xml version="1.0" encoding="UTF-8"?>
<A attr="1" aaa="2">
    <F>general</F>
    <B test="3">
        <C>element 1</C>
        <C>element 2</C>
        <C>element 3</C>
    </B>
    <D>general</D>
    <E>general</E>

    <inline-element/>
    <inline-element with="attributes"/>

    <!-- Comment -->

    <inline-element />
    <inline-element with="attributes" />

</A>

And get: 得到:

<A attr="1" aaa="2">
    <F>general</F>
    <B test="3">
        <C>element 1</C>
        <C>element 2</C>
        <C>element 3</C>
    </B>
    <D>general</D>
    <E>general</E>

    <inline-element></inline-element>
    <inline-element with="attributes"></inline-element>



    <inline-element></inline-element>
    <inline-element with="attributes"></inline-element>

</A>

It's fine for me that an <elem /> becomes <elem></elem> , but I'd really like to have access to the XML-declaration and the comments. 对我来说, <elem />变成<elem></elem>很好,但是我真的很想访问XML声明和注释。

For to get access to an event when a comment is seen, you need to use a Lexcial Handler. 为了在看到评论时访问事件,您需要使用Lexcial Handler。 See https://docs.oracle.com/javase/tutorial/jaxp/sax/events.html 参见https://docs.oracle.com/javase/tutorial/jaxp/sax/events.html

// Implement a handler
LexialHandler handler = new LexicalHandler() {
    @Override
    public void comment(char[] ch, int start, int length) throws SAXException {
    // ...   
    }
}

// Use the handler

SAXParser saxParser = factory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler",
                      handler); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM