简体   繁体   中英

Parsing XML using SAX

sample xml ,

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Issue>
     <Snippet>     
           sri;;
           hiil
           bye;
           tc;
    </Snippet>
    </Issue>

Is it possible to get the entire characters inside snippet tag ??

if this is the implementation,

    public void startElement(String uri, String localName,
             String qName, Attributes attributes) throws SAXException {
      temp = "";
      if (qName.equalsIgnoreCase("Issue")) {
             acct = new Account();

      public void endElement(String uri, String localName, String qName)
             throws SAXException {

      if (qName.equalsIgnoreCase("Issue")) {
             // add it to the list
             accList.add(acct);
       else if(qName.equalsIgnoreCase("Snippet"))
           {
               acct.setPrimarySnippet(temp);
           }

O/p is tc; but i need entire values inside the snippet tag to get printed.

arraylist is used . Getter and setter methods used to stre and retrieve values.

use the method "characters".

http://docs.oracle.com/javase/1.5.0/docs/api/org/xml/sax/helpers/DefaultHandler.html#characters(char[], int, int)

Meaning, you have to implement the startElement and endElement methods (to signal that you're entering and exiting the 'Snippet' tag, and then the characters method will return the characters.

  public void startElement(String uri, String localName,
             String qName, Attributes attributes) throws SAXException {
      temp = "";
      if (qName.equalsIgnoreCase("Issue")) {
             someFlagVariable = true;

 public void endElement(String uri, String localName, String qName)
             throws SAXException {
      if (qName.equalsIgnoreCase("Issue")) {
             someFlagVariable = false;
      }
 }
public void characters(char[] ch,
                   int start,
                   int length)
            throws SAXException{
   if (someFlagVariable ){
       String content = new String(ch, start, length).trim(); //this is your content
   }
}

Yes.

You should be grabbing the value for "temp" (the value you set as the primary snippet) in the characters() method.

However, you should be aware that there isn't a guarantee as to when characters() will be executed, and may be called several times within a single node. So within you override of the characters() method you need to build a string up - that way when you get to endElement() you will have the complete value.

You can see an example implementation here

But you basically want something like:

StringBuffer chars = new StringBuffer();

public void startElement(String uri, String localName, String qName, Attributes atts) {
    chars = new StringBuffer();
}

public void endElement(String uri, String localName, String qName) throws SAXException {
    if (qName.equalsIgnoreCase("Issue")) {
        // add it to the list
        accList.add(acct);
     else if(qName.equalsIgnoreCase("Snippet")){
        acct.setPrimarySnippet(chars);
     }
}

public void characters(char ch[], int start, int length) {
    chars.append(new String(ch, start, length));
}

(Although note, the above only works if you only care about text in leaf nodes - as we are new'ing the stringBuffer on startElement(), if you want the text of non-leaf nodes then you would need to introduce flags in the startElement() method so you only re-instantiate the stringbuffer at the right time)

1) to print text inside Snippet you should implement

public void characters(char ch[], int start, int length)

2) text inside Snippet contains several lines, with SAX you will be getting each line separately, this behaviour is documented in SAX API, it may depend on provider, but at least with JDK default SAX parser you cannot change it. Try StAX, it has XMLInputFactory.IS_COALESCING option that fixes this problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM