简体   繁体   English

使用Java中的SAX解析器从XML获取所有标记和值

[英]Get all the tags and values from XML using SAX parser in java

I am trying to parse xml using SAX . 我正在尝试使用SAX解析xml I want all the tags and their values from xml in nested way. 我希望以嵌套方式从xml中获取所有标签及其值。 Is it possible with SAX parser. SAX解析器是否可能。 Can anyone provide me an example. 谁能给我一个例子。 (I think SAX is efficient than w3 document builder, So I chose it. And I want to know weather I'm on the right path) I'm attaching my java program (我认为SAX比w3文档生成器更有效,所以我选择了它。而且我想知道天气是正确的)我附加了Java程序

class MySAXApp extends DefaultHandler
{
    public MySAXApp ()
    {
        super();
    }
    public void startDocument ()
    {
        System.out.println("Start document");
    }
    public void endDocument ()
    {
        System.out.println("End document");
    }


    public void startElement (String uri, String name,
            String qName, Attributes atts)
    {
        System.out.println(atts.getLength());
        if ("".equals (uri))
            System.out.println("Start element: " + qName);
        else
            System.out.println("Start element: {" + uri + "}" + name);
    }

} }

Here is my XML. 这是我的XML。 Is this a valid xml? 这是有效的xml吗? Are there any errors in writing xml like this 这样编写xml是否有任何错误

<?xml version="1.0" encoding="utf-8"?>
<CustomerReport xsi:schemaLocation="Customer.xsd">
    <Customer>
        <CustomerName>str1234</CustomerName>
        <CustomerStatus>str1234</CustomerStatus>
        <PurchaceOrders>
            <PurchaceOrder>
                <PurchaceOrderName>str1234</PurchaceOrderName>
            </PurchaceOrder>
        </PurchaceOrders>
    </Customer>
</CustomerReport>

I'm new to XML. 我是XML的新手。 Can someone help me on this 有人可以帮我吗

When you say SAX is "more efficient", what you actually mean is that a SAX parser does the minimum amount of work, leaving most of the work to the application. 当您说SAX“更高效”时,您实际上的意思是SAX解析器完成的工作量最少,而大部分工作留给了应用程序。 That means you (the application writer) have more code to write, and it's quite tricky code as you are discovering. 这意味着您(应用程序编写者)需要编写更多的代码,并且发现的代码非常棘手。 Because the people who write XML parsers are much more experienced Java coders than you are, it's likely that the more work you do in your code, and the less you do within the parser, the less efficient your overall application will be. 因为编写XML解析器的人员比您经验丰富的Java编码器,所以您在代码中完成的工作越多,而在解析器中所做的工作越少,则整个应用程序的效率就越低。 So given your level of experience, my advice would be to use a parsing approach where the library does as much as possible of the work. 因此,考虑到您的经验水平,我的建议是使用一种解析方法,使库尽可能多地完成工作。 I would suggest using JDOM2. 我建议使用JDOM2。

The only attribute you have in the XML you posted is for the attribute with the xsi prefix. 您发布的XML中唯一的属性是带有xsi前缀的属性。 For the rest the attribute length should be 0. 其余属性长度应为0。

Attributes are key-value pairs inside a tag. 属性是标记内的键/值对。 Most of your xml content is inside of elements. 您的大多数xml内容都在元素内部。

The efficiency advantage of SAX (or STAX) over something like JDOM is due to the sax parser not maintaining all the data it reads in memory. SAX(或STAX)优于JDOM之类的效率优势是由于sax解析器未将其读取的所有数据都保留在内存中。 If you use the contentHandler to retrieve data and save it as it gets read then your program doesn't have to consume that much memory. 如果使用contentHandler检索数据并在读取数据时将其保存,则您的程序不必消耗那么多的内存。

Read this tutorial or this Javaworld article . 阅读本教程这篇Javaworld文章 You need to implement a characters method in order to get any element text. 您需要实现一个character方法才能获取任何元素文本。 Both linked articles have good examples of how to implement your characters method so that you can retrieve element text. 这两篇链接的文章都提供了有关如何实现character方法的好示例,以便您可以检索元素文本。

There are a lot of bad examples for this that you are likely to find if you google around ( bad example ) or search on stackoverflow ( bad example here ), but the example implementations in the linked articles are correct, because they buffer the output from the characters method until all characters have been found: 有很多不好的例子,您可能会发现是在谷歌周围搜索( 不好的例子 )还是在stackoverflow上搜索( 这里的不好的例子 ),但是链接文章中的示例实现是正确的,因为它们缓冲了来自字符方法,直到找到所有字符:

Parsers are not required to return any particular number of characters at one time. 解析器不需要一次返回任何特定数量的字符。 A parser can return anything from a single character at a time up to several thousand and still be a standard-conforming implementation. 解析器一次最多可以返回单个字符中的任何内容,并且仍然是符合标准的实现。 So if your application needs to process the characters it sees, it is wise to have the characters() method accumulate the characters in a java.lang.StringBuffer and operate on them only when you are sure that all of them have been found. 因此,如果您的应用程序需要处理它所看到的字符,明智的做法是让character()方法在java.lang.StringBuffer中累积字符并仅在确定已找到所有字符时对其进行操作。

Here is the ContentHandler from the JavaWorld article's hello world example changed to use your xml: 这是JavaWorld文章的hello world示例中的ContentHandler,已更改为使用xml:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
public class Example2 extends DefaultHandler {
   // Local variables to store data
   // found in the XML document
   public  String  name       = "";
   public  String  status   = "";
   public String orderName = ""
   // Buffer for collecting data from // the "characters" SAX event.
   private CharArrayWriter contents = new CharArrayWriter();
   // Override methods of the DefaultHandler class
   // to gain notification of SAX Events.
   //
        // See org.xml.sax.ContentHandler for all available events.
   //
   public void startElement( String namespaceURI,
              String localName,
              String qName,
              Attributes attr ) throws SAXException {
      contents.reset();
   }
   public void endElement( String namespaceURI,
              String localName,
              String qName ) throws SAXException {
      if ( localName.equals( "CustomerName" ) ) {
         name = contents.toString();
      }
      if ( localName.equals( "CustomerStatus" ) ) {
         status = contents.toString();
      }
      if (localName.equals("PurchaceOrderName")) {
         orderName = contents.toString();
      }
   }
   public void characters( char[] ch, int start, int length )
                  throws SAXException {
      contents.write( ch, start, length );
   }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM