嘗試解析以python編寫的RSS閱讀器的提要

Question

我仍然是python初學者。 作為一個練習項目，我想編寫自己的RSS閱讀器。 我在這里找到了有用的教程：學習python 。 我使用了該教程中提供的代碼：

#! /usr/bin/env python    
import urllib2
from xml.dom import minidom, Node

""" Get the XML """
url_info = urllib2.urlopen('http://rss.slashdot.org/Slashdot/slashdot')

if (url_info):
    """ We have the RSS XML lets try to parse it up """
    xmldoc = minidom.parse(url_info)
    if (xmldoc):
        """We have the Doc, get the root node"""
        rootNode = xmldoc.documentElement
        """ Iterate the child nodes """
        for node in rootNode.childNodes:
            """ We only care about "item" entries"""
            if (node.nodeName == "item"):
                """ Now iterate through all of the <item>'s children """
                for item_node in node.childNodes:
                    if (item_node.nodeName == "title"):
                        """ Loop through the title Text nodes to get
                        the actual title"""
                        title = ""
                        for text_node in item_node.childNodes:
                            if (text_node.nodeType == node.TEXT_NODE):
                                title += text_node.nodeValue
                        """ Now print the title if we have one """
                        if (len(title)>0):
                            print title

                    if (item_node.nodeName == "description"):
                        """ Loop through the description Text nodes to get
                        the actual description"""
                        description = ""
                        for text_node in item_node.childNodes:
                            if (text_node.nodeType == node.TEXT_NODE):
                                description += text_node.nodeValue
                        """ Now print the title if we have one.
                        Add a blank with \n so that it looks better """
                        if (len(description)>0):
                            print description + "\n"
    else:
        print "Error getting XML document!"
else:
    print "Error! Getting URL"<code>

一切都按預期工作，首先我想我理解了一切。 但是，一旦我使用另一個RSS提要（例如“ http://www.spiegel.de/schlagzeilen/tops/index.rss”），我的應用程序就會從Eclipse IDE中收到“終止”錯誤。該錯誤消息，因為我無法確定應用程序的確切位置以及終止原因。由於調試器會忽略我的斷點，因此調試器沒有太大幫助。

有人知道我在做什么錯嗎？

Answer 1

好吧，“終止”消息不是錯誤，它只是為了告知python退出而沒有錯誤。

您沒有做錯任何事情，只是這個RSS閱讀器不是很靈活，因為它只知道RSS的一種變體。

如果比較slashdot和Spiegel Online的XML文檔，您會發現文檔結構有所不同：

Slashdot的：

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" ...>
  <channel rdf:about="http://slashdot.org/">
    <title>Slashdot</title>
    <!-- more stuff (but no <item>-tags) -->
  </channel>
  <item rdf:about="blabla">
    <title>The Condescending UI</title>
    <!-- item data -->
  </item>
  <!-- more <item>-tags -->
</rdf:RDF>

Spiegel在線：

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
  <channel>
    <title>SPIEGEL ONLINE - Schlagzeilen</title>
    <link>http://www.spiegel.de</link>
    <item>
      <title>Streit über EU-Veto: Vize Clegg meutert gegen britischen Premier Cameron</title>
    </item>
    <!-- more <item>-tags -->
  <channel>
</rss>

在Spiegel Online的供稿中，所有<item>元素都在<channel> -tag中，而在slashdot供稿中，它們都在根 -tag（ <rdf:RDF> ）中。 並且您的python代碼僅在root -tag中需要這些項。

如果要讓rss閱讀器同時使用這兩個提要，則可以例如更改以下行：

for node in rootNode.childNodes:

對此：

for node in rootNode.getElementsByTagName('item'):

這樣，所有<item> -tag都會被枚舉，無論它們在XML文檔中的位置如何。

Answer 2

如果什么都沒發生，那么也許代碼中的所有內容都是正確的，只是與正確的元素不匹配:)

如果有異常，請嘗試從命令行啟動：

python <yourfilename.py>

或者使用try / catch捕獲異常，並輸出錯誤：

try:
    # your code
catch Exception, e:
    # print it
    print 'My exception is', e

嘗試解析以python編寫的RSS閱讀器的提要

問題描述

2 個解決方案

解決方案1
4 已采納 2011-12-11 14:55:06

解決方案2
0 2011-12-11 14:54:14

嘗試解析以python編寫的RSS閱讀器的提要

問題描述

2 個解決方案

解決方案1 4 已采納 2011-12-11 14:55:06

解決方案2 0 2011-12-11 14:54:14

解決方案1
4 已采納 2011-12-11 14:55:06

解決方案2
0 2011-12-11 14:54:14