简体   繁体   English

Java:解析RSS Feed时出错

[英]Java: error while parsing a RSS feed

Here below you can see the code. 在下面,您可以查看代码。

public static void main(String[] args) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(true);
        factory.setIgnoringElementContentWhitespace(true);
        DocumentBuilder builder = factory.newDocumentBuilder();

        Document doc = builder.parse("http://rss.adnkronos.com/RSS_Politica.xml");

        NodeList nodes = doc.getElementsByTagName("title");

        for(int k=0; k < nodes.getLength(); k++) {
            System.out.print(nodes.item(k));
        }

    }

The link of the RSS feed is the following: http://rss.adnkronos.com/RSS_Politica.xml RSS feed的链接如下: http : //rss.adnkronos.com/RSS_Politica.xml

The result (in the console) is the following: 结果(在控制台中)如下:

null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null

The value of nodes title, as you can see in the xml, is not null obviously. 您可以在xml中看到,节点标题的值显然不是null。

After the result, the following errors are shown (translated from italian). 结果之后,显示以下错误(从意大利语翻译)。

Error: URI= http://rss.adnkronos.com/RSS_Politica.xml Line=1: The root element "rss" must match the root DOCTYPE "null". 错误:URI = http://rss.adnkronos.com/RSS_Politica.xml第1行:根元素“ rss”必须与根DOCTYPE“ null”匹配。

Error: URI= http://rss.adnkronos.com/RSS_Politica.xml Line=1: Document is invalid: no grammar found. 错误:URI = http://rss.adnkronos.com/RSS_Politica.xml第1行:文档无效:找不到语法。

Look into validation options for the errors you are getting. 查看验证选项以了解所得到的错误。 As far as the null's for title it seems the toString on Node just returns null or does something that is just getting null. 就标题的null而言,似乎Node上的toString仅返回null或执行仅获取null的操作。 if you update it to System.out.print(nodes.item(k).getTextContent()); 如果将其更新为System.out.print(nodes.item(k).getTextContent()); it will print out the titles. 它将打印出标题。

There are two problems. 有两个问题。 Let's take care of the one you probably care most about first. 让我们先照顾一下您最关心的那个人。

The nodes in your NodeList are Element nodes. 您的NodeList中的节点是Element节点。 The actual Text nodes are their children. 实际的Text节点是它们的子节点。 So to get the values you want, you can do: 因此,要获得所需的值,可以执行以下操作:

nodes.item(k).getFirstChild().getNodeValue()

Or (in this case): 或者(在这种情况下):

nodes.item(k).getTextContent()

Personally I think the former is slightly more robust when doing general parsing because getTextContent() will concatenate all the text content from all the child nodes if there just happened to be more than one. 我个人认为前者在进行常规分析时会更健壮,因为如果碰巧有多个子节点,则getTextContent()会连接所有子节点中的所有文本内容。

As for the validation errors, by default when you do setValidating(true), it's looking for an embedded DTD, which is not there, and it's complaining to you about it. 至于验证错误,默认情况下,当您执行setValidating(true)时,它会寻找一个嵌入式DTD,它不存在,并且正在向您抱怨。 The tl;dr is to setValidating(false). tl; dr是setValidating(false)。

If you really want to validate the RSS, you should try to find an unofficial (because there is no official one) XSD schema file and set that up in your DocumentBuilderFactory. 如果您真的想验证RSS,则应尝试查找一个非官方的(因为没有正式的)XSD模式文件,并在DocumentBuilderFactory中进行设置。 Using an XSD for RSS in this context is probably not worthwhile, though, because half the RSS on the Internet, while perfectly usable, would probably fail validation :). 但是,在这种情况下将XSD用于RSS可能并不值得,因为互联网上一半的RSS虽然可以完美使用,但可能无法通过验证:)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM