简体   繁体   English

如何使用SAX解析器解析命名空间?

[英]How can I parse a namespace using the SAX parser?

Using a twitter search URL ie. 使用Twitter搜索URL即。 http://search.twitter.com/search.rss?q=android returns CSS that has an item that looks like: http://search.twitter.com/search.rss?q=android返回包含以下项目的CSS:

<item>
      <title>@UberTwiter still waiting for @ubertwitter  android app!!!</title>
      <link>http://twitter.com/meals69/statuses/21158076391</link>
      <description>still waiting for an app!!!</description>
      <pubDate>Sat, 14 Aug 2010 15:33:44 +0000</pubDate>
      <guid>http://twitter.com/meals69/statuses/21158076391</guid>
      <author>Some Twitter User</author>
      <media:content type="image/jpg" height="48" width="48" url="http://a1.twimg.com/profile_images/756343289/me2_normal.jpg"/>
      <google:image_link>http://a1.twimg.com/profile_images/756343289/me2_normal.jpg</google:image_link>
      <twitter:metadata>
        <twitter:result_type>recent</twitter:result_type>
</twitter:metadata>
</item>

Pretty simple. 很简单。 My code parses out everything (title, link, description, pubDate, etc.) without any problems. 我的代码解析了所有内容(标题,链接,描述,pubDate等),没有任何问题。 However, I'm getting null on: 但是,我得到了null:

<google:image_link>

I'm using Java to parse the RSS feed. 我正在使用Java来解析RSS提要。 Do I have to handle compound localnames differently than I would a more simple localname? 我是否必须以不同于更简单的本地名称的方式处理复合本地名称?

This is the bit of code that parses out Link, Description, pubDate, etc: 这是解析Link,Description,pubDate等的一些代码:

@Override
    public void endElement(String uri, String localName, String name)
            throws SAXException {
        super.endElement(uri, localName, name);
        if (this.currentMessage != null){
            if (localName.equalsIgnoreCase(TITLE)){
                currentMessage.setTitle(builder.toString());
            } else if (localName.equalsIgnoreCase(LINK)){
                currentMessage.setLink(builder.toString());
            } else if (localName.equalsIgnoreCase(DESCRIPTION)){
                currentMessage.setDescription(builder.toString());
            } else if (localName.equalsIgnoreCase(PUB_DATE)){
                currentMessage.setDate(builder.toString());
            } else if (localName.equalsIgnoreCase(GUID)){
                currentMessage.setGuid(builder.toString());
            } else if (uri.equalsIgnoreCase(AVATAR)){
                currentMessage.setAvatar(builder.toString());
            } else if (localName.equalsIgnoreCase(ITEM)){
                messages.add(currentMessage);
            } 
            builder.setLength(0);   
        }
    }

startDocument looks like: startDocument看起来像:

@Override
    public void startDocument() throws SAXException {
        super.startDocument();
        messages = new ArrayList<Message>();
        builder = new StringBuilder();

    }

startElement looks like: startElement看起来像:

@Override
    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        super.startElement(uri, localName, name, attributes);
        if (localName.equalsIgnoreCase(ITEM)){
            this.currentMessage = new Message();
        } 
    }

Tony 托尼

An element like <google:image_link> has the local name image_link belonging to the google namespace. <google:image_link>这样的元素具有属于google名称空间的本地名称image_link You need to ensure that the XML parsing framework is aware of namespaces, and you'd then need to find this element using the appropriate namespace. 您需要确保XML解析框架知道命名空间,然后您需要使用适当的命名空间来查找此元素。

For example, a few SAX1 interfaces in package org.xml.sax has been deprecated, replaced by SAX2 counterparts that include namespace support (eg SAX1 Parser is deprecated and replaced by SAX2 XMLReader ). 例如, package org.xml.sax的一些SAX1接口已被弃用,取而代之的是包含名称空间支持的SAX2对应项(例如,SAX1 Parser已弃用并由SAX2 XMLReader取代)。 Consult the documentation on how to specify the namespace uri or qualified (prefixed) qName . 请参阅有关如何指定命名空间uri或限定(前缀) qName的文档。

See also 也可以看看

From sample it is not actually clear what namespace that 'google' prefix binds to -- previous answer is slightly incorrect in that it is NOT in "google" namespace; 从示例中,实际上并不清楚'google'前缀绑定到哪个命名空间 - 之前的答案稍有不正确,因为它不在“google”命名空间中; rather, it is a namespace that prefix "google" binds to. 相反,它是前缀“google”绑定的名称空间。 As such you have to match the namespace (identified by URI), and not prefix. 因此,您必须匹配命名空间(由URI标识),而不是前缀。 SAX does have confusing way of reporting local name / namespace-prefix combinations, and it depends on whether namespace processing is even enabled. SAX确实有报告本地名称/名称空间 - 前缀组合的令人困惑的方式,它取决于是否启用了名称空间处理。

You could also consider alternative XML processing libraries / APIs; 您还可以考虑其他XML处理库/ API; while SAX implementations are performant, there are as fast and more convenient alternatives. 虽然SAX实现具有高性能,但有更快,更方便的替代方案。 Stax (javax.xml.stream.*) implementations like Woodstox (and even default one that JDK 1.6 comes with) are fast and bit more convenient. 像Woodstox这样的Stax(javax.xml.stream。*)实现(甚至是JDK 1.6附带的默认实现)都快速且方便。 And StaxMate library that builds on top of Stax is much simpler to use for both reading and writing, and speedwise as fast as SAX implementations like Xerces. 构建在Stax之上的StaxMate库可以更简单地用于读取和写入,并且像Xerces这样的SAX实现速度快。 Plus Stax API has less baggage wrt namespace handling so it is easier to see what is the actual namespace of elements. 加上Stax API在命名空间处理方面的行李更少,因此更容易看到元素的实际命名空间。

Like user polygenelubricants said: generally the parser needs to be namespace aware to handle elements which belong to some prefixed namespace. 就像用户polygenelubricants所说的那样:通常解析器需要知道名称空间来处理属于某个前缀命名空间的元素。 (Like that <google:image_link> element.) (就像那个<google:image_link>元素。)

This needs to be set as a "parser feature" which AFAIK can be done in few different ways: The XMLReader interface itself has method setFeature() that can be used to set features for a certain parser but you can also use same method for SAXParserFactory class so that this factory generates parsers with those features already on or off. 这需要设置为“解析器功能”,AFAIK可以通过几种不同的方式完成:XMLReader接口本身具有方法setFeature() ,可用于为某个解析器设置功能,但您也可以对SAXParserFactory使用相同的方法class,以便此工厂生成已经打开或关闭这些功能的解析器。 SAX2 standard feature flags should be on SAXproject's website but at least some of them are also listed in Java API documentation of package org.xml.sax . SAX2标准功能标志应位于SAXproject的网站上,但至少其中一些标志也列在包org.xml.sax的 Java API文档中。

For simple documents you can try to take a shortcut. 对于简单文档,您可以尝试使用快捷方式。 If you don't actually care about namespaces and element names as in a URL + local-name combination, and you can trust that the elements you are looking for (and only these) always have certain prefix and that there aren't elements from other namespaces with same local name then you might just solve your problem by using qname parameter of startElement() method instead of localName or vice versa or by adding/dropping the prefix from the tag name string you compare to. 如果您实际上并不像URL +本地名称组合那样关心名称空间和元素名称,并且您可以相信您要查找的元素(并且只有这些元素)始终具有某些前缀,并且没有来自具有相同本地名称的其他名称空间然后您可以通过使用startElement()方法的qname参数而不是localName或反之亦然或通过添加/删除您比较的标记名称字符串中的前缀来解决您的问题。

The contents of parameters namespaceUri , qname or localName is according to Java specs actually optional and AFAIK they might be null for this reason. 参数namespaceUriqnamelocalName内容根据Java规范实际上是可选的,而AFAIK它们可能因此而为null Which ones of them are null depends on what are those aforementioned "parser features" that affect namespaces. 它们中的哪些是null取决于那些影响命名空间的前述“解析器功能”。 I don't know can the parameter that is null vary between elements in a namespace and elements without a namespace - I haven't investigated that behaviour. 我不知道null的参数可以在命名空间中的元素和没有命名空间的元素之间变化 - 我没有调查过这种行为。

PS. PS。 XML is case sensitive. XML区分大小写。 So ideally you don't need to ignore case in tag name string comparison. 理想情况下,您不需要忽略标记名称字符串比较中的大小写。
-First post, yay! - 第一篇文章,耶!

Might help someone using the Android SAX util . 可以帮助使用Android SAX util的人 I was trying geo:lat to get the lat element form the geo namepace. 我正在尝试使用geo:lat来获取地理名称空间的lat元素。

Sample XML: 示例XML:

<item> 
 <title>My Item title</title> 
 <geo:lat>40.720741</geo:lat> 
</item>

First attempt returned null: 第一次尝试返回null:

item.getChild("geo:lat");

As suggested above, I found passing the namespace URI to the getChild method worked. 如上所述,我发现将名称空间URI传递给getChild方法。

item.getChild("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat");

Using startPrefixMapping method of my xml handler I was able to parse out text of a namespace. 使用我的xml处理程序的startPrefixMapping方法,我能够解析出命名空间的文本。

I placed several calls to this method beneath my handler instantiation. 我在处理程序实例化下面对这个方法进行了几次调用。

GoogleReader xmlhandler = new GoogleReader();
xmlhandler.startPrefixMapping("dc", "http://purl.org/dc/elements/1.1/");

where dc is the namespace <dc:author>some text</dc:author> 其中dc是命名空间<dc:author>some text</dc:author>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM