How can I parse a namespace using the SAX parser?

Question

Using a twitter search URL ie. http://search.twitter.com/search.rss?q=android returns CSS that has an item that looks like:

<item>
      <title>@UberTwiter still waiting for @ubertwitter  android app!!!</title>
      <link>http://twitter.com/meals69/statuses/21158076391</link>
      <description>still waiting for an app!!!</description>
      <pubDate>Sat, 14 Aug 2010 15:33:44 +0000</pubDate>
      <guid>http://twitter.com/meals69/statuses/21158076391</guid>
      <author>Some Twitter User</author>
      <media:content type="image/jpg" height="48" width="48" url="http://a1.twimg.com/profile_images/756343289/me2_normal.jpg"/>
      <google:image_link>http://a1.twimg.com/profile_images/756343289/me2_normal.jpg</google:image_link>
      <twitter:metadata>
        <twitter:result_type>recent</twitter:result_type>
</twitter:metadata>
</item>

Pretty simple. My code parses out everything (title, link, description, pubDate, etc.) without any problems. However, I'm getting null on:

<google:image_link>

I'm using Java to parse the RSS feed. Do I have to handle compound localnames differently than I would a more simple localname?

This is the bit of code that parses out Link, Description, pubDate, etc:

@Override
    public void endElement(String uri, String localName, String name)
            throws SAXException {
        super.endElement(uri, localName, name);
        if (this.currentMessage != null){
            if (localName.equalsIgnoreCase(TITLE)){
                currentMessage.setTitle(builder.toString());
            } else if (localName.equalsIgnoreCase(LINK)){
                currentMessage.setLink(builder.toString());
            } else if (localName.equalsIgnoreCase(DESCRIPTION)){
                currentMessage.setDescription(builder.toString());
            } else if (localName.equalsIgnoreCase(PUB_DATE)){
                currentMessage.setDate(builder.toString());
            } else if (localName.equalsIgnoreCase(GUID)){
                currentMessage.setGuid(builder.toString());
            } else if (uri.equalsIgnoreCase(AVATAR)){
                currentMessage.setAvatar(builder.toString());
            } else if (localName.equalsIgnoreCase(ITEM)){
                messages.add(currentMessage);
            } 
            builder.setLength(0);   
        }
    }

startDocument looks like:

@Override
    public void startDocument() throws SAXException {
        super.startDocument();
        messages = new ArrayList<Message>();
        builder = new StringBuilder();

    }

startElement looks like:

@Override
    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        super.startElement(uri, localName, name, attributes);
        if (localName.equalsIgnoreCase(ITEM)){
            this.currentMessage = new Message();
        } 
    }

Tony

Answer 1

An element like <google:image_link> has the local name image_link belonging to the google namespace. You need to ensure that the XML parsing framework is aware of namespaces, and you'd then need to find this element using the appropriate namespace.

For example, a few SAX1 interfaces in package org.xml.sax has been deprecated, replaced by SAX2 counterparts that include namespace support (eg SAX1 Parser is deprecated and replaced by SAX2 XMLReader ). Consult the documentation on how to specify the namespace uri or qualified (prefixed) qName .

From sample it is not actually clear what namespace that 'google' prefix binds to -- previous answer is slightly incorrect in that it is NOT in "google" namespace; rather, it is a namespace that prefix "google" binds to. As such you have to match the namespace (identified by URI), and not prefix. SAX does have confusing way of reporting local name / namespace-prefix combinations, and it depends on whether namespace processing is even enabled.

You could also consider alternative XML processing libraries / APIs; while SAX implementations are performant, there are as fast and more convenient alternatives. Stax (javax.xml.stream.*) implementations like Woodstox (and even default one that JDK 1.6 comes with) are fast and bit more convenient. And StaxMate library that builds on top of Stax is much simpler to use for both reading and writing, and speedwise as fast as SAX implementations like Xerces. Plus Stax API has less baggage wrt namespace handling so it is easier to see what is the actual namespace of elements.

Answer 3

Like user polygenelubricants said: generally the parser needs to be namespace aware to handle elements which belong to some prefixed namespace. (Like that <google:image_link> element.)

This needs to be set as a "parser feature" which AFAIK can be done in few different ways: The XMLReader interface itself has method setFeature() that can be used to set features for a certain parser but you can also use same method for SAXParserFactory class so that this factory generates parsers with those features already on or off. SAX2 standard feature flags should be on SAXproject's website but at least some of them are also listed in Java API documentation of package org.xml.sax .

For simple documents you can try to take a shortcut. If you don't actually care about namespaces and element names as in a URL + local-name combination, and you can trust that the elements you are looking for (and only these) always have certain prefix and that there aren't elements from other namespaces with same local name then you might just solve your problem by using qname parameter of startElement() method instead of localName or vice versa or by adding/dropping the prefix from the tag name string you compare to.

The contents of parameters namespaceUri , qname or localName is according to Java specs actually optional and AFAIK they might be null for this reason. Which ones of them are null depends on what are those aforementioned "parser features" that affect namespaces. I don't know can the parameter that is null vary between elements in a namespace and elements without a namespace - I haven't investigated that behaviour.

PS. XML is case sensitive. So ideally you don't need to ignore case in tag name string comparison.
-First post, yay!

Answer 4

Might help someone using the Android SAX util . I was trying geo:lat to get the lat element form the geo namepace.

Sample XML:

<item> 
 <title>My Item title</title> 
 <geo:lat>40.720741</geo:lat> 
</item>

First attempt returned null:

item.getChild("geo:lat");

As suggested above, I found passing the namespace URI to the getChild method worked.

item.getChild("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat");

Answer 5

Using startPrefixMapping method of my xml handler I was able to parse out text of a namespace.

I placed several calls to this method beneath my handler instantiation.

GoogleReader xmlhandler = new GoogleReader();
xmlhandler.startPrefixMapping("dc", "http://purl.org/dc/elements/1.1/");

where dc is the namespace <dc:author>some text</dc:author>

How can I parse a namespace using the SAX parser?

Question

5 answers

solution1
1 2010-08-14 16:05:25

See also

solution2
1 2010-08-15 03:56:09

solution3
0 2010-08-15 09:41:27

solution4
0 2010-12-29 10:52:02

solution5
0 2011-04-17 09:23:57

How can I parse a namespace using the SAX parser?

Question

5 answers

solution1 1 2010-08-14 16:05:25

See also

solution2 1 2010-08-15 03:56:09

solution3 0 2010-08-15 09:41:27

solution4 0 2010-12-29 10:52:02

solution5 0 2011-04-17 09:23:57

solution1
1 2010-08-14 16:05:25

solution2
1 2010-08-15 03:56:09

solution3
0 2010-08-15 09:41:27

solution4
0 2010-12-29 10:52:02

solution5
0 2011-04-17 09:23:57