简体   繁体   English

使用SAX字符方法从XML元素解析PCDATA

[英]Using the SAX characters method to parse PCDATA from an XML element

I'm using the SAX API to parse in an xml document, but struggling to store the element PCDATA from each location within the XML. 我正在使用SAX API来解析xml文档,但是却难以从XML中的每个位置存储元素PCDATA。

The Oracle docs SAX API show that the characters() is used to parse in PCDATA from an element, but I'm not sure on how it supposed to be called. Oracle docs SAX API显示了character()用于从元素中解析PCDATA,但是我不确定应该如何调用它。

In my current implementation, boolean flags are used to signal when a certain element within the XML document has been encountered. 在我当前的实现中,布尔标志用于表示何时遇到XML文档中的某个元素。 The flags are being triggered in the startElement() as they should when an element is encountered. 标志在startElement()中被触发,就像遇到一个元素时应该触发的那样。

I set a breakpoint on the boolean variable description in charaters() but the boolean isn't set to true until startElement() is called, meaning the PCDATA is never parsed. 我在charaters()的布尔变量description上设置了一个断点,但是直到调用startElement() ,布尔值才设置为true,这意味着从不解析PCDATA。

My question is how can I call the characters() after the boolean values are set in startElement() ? 我的问题是在startElement()中设置了布尔值后如何调用character startElement()

This is the startElement() which is called after the charaters() : 这是在startElement()之后调用的startElement() charaters()

public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {
        if (qName.equals("location")){
            location = true;

            System.out.println("Found a location...");
            try {
                //Read in the values for the attributes of the element <location>
                int locationID = Integer.parseInt(atts.getValue("id"));
                String locationName = atts.getValue("name");


                //Generate a new instance of Location on-the-fly using reflection. The statement Class.forName("gmit.Location").newInstance(); invokes the 
                //Java Class Loader and the calls the null (default) constructor of Location.
                Location loc = (Location) Class.forName("gmit.Location").newInstance();
                loc.setId(locationID); //Now configure the Location object with an ID, Name, Description etc...
                loc.setName(locationName);
                loc.setDescription(locationDescription);


            } catch (Exception e) {
                e.printStackTrace();
            }

        }else if (qName.equals("description")){
            description = true;
            //need to invoke the charaters method here after the description 
            //flag is set to true
            System.out.println("Found a description. You should tie this to the last location you encountered...");

    }

The charaters() is called as soon as the program starts, but it needs to be called after the boolean flags are set in the above method: 程序启动后立即调用charaters() ,但是在上述方法中设置了布尔标志后需要调用charaters()

public void characters(char[] ch,int start, int length) throws SAXException{
        if (location){

        }else if (description){

            locationDescription = new String( ch, start, length); 
            System.out.println("Description = " + locationDescription);

    }

Sample of one of the locations within the XML file: XML文件中的位置之一的示例:

<location id="1" name="Tiberius">
        <description>
        You are in the city of Tiberius. You see a long street with high buildings and a castle.You see an exit to the south.
        </description>
        <exit title="Desert" direction="S"/>
    </location>

how can I call the characters() after the boolean values are set in startElement() ? 在startElement()中设置布尔值后,如何调用character()?

You can't. 你不能 The whole point of SAX parsing is that the parser calls your handler, you don't call the parser. SAX解析的全部要点是解析器调用您的处理程序,而不调用解析器。

Your characters method will be called each time character data is encountered in the document by the SAX parser. 每当SAX解析器在文档中遇到字符数据时,就会调用您的characters方法。 Your handler will need to decide whether this data is relevant (is it a location, a description, or something that can be ignored?) and if relevant store this data somewhere where it can be retrieved later. 您的处理程序将需要确定此数据是否相关(是位置,描述还是可以忽略的东西?),以及相关数据是否将其存储在以后可以检索的位置。

You've shown us the startElement method you are using. 您已向我们展示了正在使用的startElement方法。 If you haven't done so already, you will also want to override endElement . 如果您尚未这样做,则还需要覆盖endElement You need to set the boolean values location and description to false in an endElement method, so that your SAX handler knows it is no longer inside a location or description element as appropriate. 您需要在endElement方法中将布尔值locationdescriptionfalse ,以便您的SAX处理程序知道它不再适当地位于locationdescription元素内。

You haven't shown us a sample XML document. 您尚未向我们显示示例XML文档。 Perhaps you have something like this: 也许您有这样的事情:

 <widgetList>
     <widget>
         <name>First widget</name>
         <location>Over there</location>
         <description>This is the first widget in the list</description>
     </widget>
     <widget>
         <name>Second widget</name>
         <location>Very far away</location>
         <description>This is the second widget in the list</description>
     </widget>
</widgetList>

If so, you may want to handle the end of the widget element as well. 如果是这样,您可能还希望处理widget元素的结尾。 For example, this could take the last location and description the handler encountered, put them together in a Widget object and store this in some list inside the handler. 例如,这可以采用遇到的处理程序的最后一个位置和描述,将它们放到Widget对象中,并将其存储在处理程序内的某些列表中。 At the end of the parsing you can then read the list of widgets from the handler. 解析结束后,您可以从处理程序中读取小部件列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM