如何使用网址提取（python）捕获所有可能的错误？

Question

In my application users enter a url and I try to open the link and get the title of the page. 在我的应用程序用户中输入一个URL，然后我尝试打开链接并获取页面标题。 But I realized that there can be many different kinds of errors, including unicode characters or newlines in titles and AttributeError and IOError . 但是我意识到可能有很多不同类型的错误，包括标题中的unicode字符或换行符以及AttributeError和IOError 。 I first tried to catch each error, but now in case of a url fetch error I want to redirect to an error page where the user will enter the title manually. 我首先尝试捕获每个错误，但是现在如果出现url提取错误，我想重定向到错误页面，用户将在其中手动输入标题。 How do I catch all possible errors? 如何捕获所有可能的错误？ This is the code I have now: 这是我现在拥有的代码：

    title = "title"

    try:

        soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
        title = str(soup.html.head.title.string)

        if title == "404 Not Found":
            self.redirect("/urlparseerror")
        elif title == "403 - Forbidden":
            self.redirect("/urlparseerror")     
        else:
            title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

    except UnicodeDecodeError:    
        self.redirect("/urlparseerror?error=UnicodeDecodeError")

    except AttributeError:        
        self.redirect("/urlparseerror?error=AttributeError")

    #https url:    
    except IOError:        
        self.redirect("/urlparseerror?error=IOError")


    #I tried this else clause to catch any other error
    #but it does not work
    #this is executed when none of the errors above is true:
    #
    #else:
    #    self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")

UPDATE UPDATE

As suggested by @Wooble in the comments I added try...except while writing the title to database: 正如@Wooble在评论中所建议的那样，我添加了try...except但将title写入数据库时：

        try:
            new_item = Main(
                        ....
                        title = unicode(title, "utf-8"))

            new_item.put()

        except UnicodeDecodeError:    

            self.redirect("/urlparseerror?error=UnicodeDecodeError")

This works. 这可行。 Although the out-of-range character â€” is still in title according to the logging info: 尽管根据日志记录信息，超出范围的字符â€”仍在title ：

***title: 7.2. re â€” Regular expression operations &mdash; Python v2.7.1 documentation**

Do you know why? 你知道为什么吗？

Answer 1

You can use except without specifying any type to catch all exceptions. 您可以使用except，而无需指定任何类型来捕获所有异常。

From the python docs http://docs.python.org/tutorial/errors.html : 从python docs http://docs.python.org/tutorial/errors.html中：

import sys

try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as (errno, strerror):
    print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise

The last except will catch any exception that has not been caught before (ie a Exception which is not of IOError or ValueError.) 最后一个除外将捕获之前未捕获的任何异常（即，不是IOError或ValueError的异常。）

Answer 2

You can use the top level exception type Exception, which will catch any exception that has not been caught before. 您可以使用顶级异常类型Exception，它将捕获之前未捕获的任何异常。

http://docs.python.org/library/exceptions.html#exception-hierarchy http://docs.python.org/library/exceptions.html#exception-hierarchy

try:

    soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
    title = str(soup.html.head.title.string)

    if title == "404 Not Found":
        self.redirect("/urlparseerror")
    elif title == "403 - Forbidden":
        self.redirect("/urlparseerror")     
    else:
        title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

except UnicodeDecodeError:    
    self.redirect("/urlparseerror?error=UnicodeDecodeError")

except AttributeError:        
    self.redirect("/urlparseerror?error=AttributeError")

#https url:    
except IOError:        
    self.redirect("/urlparseerror?error=IOError")

except Exception, ex:
    print "Exception caught: %s" % ex.__class__.__name__

如何使用网址提取（python）捕获所有可能的错误？

问题描述

2 个解决方案

解决方案1
2 已采纳 2011-03-05 23:32:00

解决方案2
2 2011-03-05 23:56:49

如何使用网址提取（python）捕获所有可能的错误？

问题描述

2 个解决方案

解决方案1 2 已采纳 2011-03-05 23:32:00

解决方案2 2 2011-03-05 23:56:49

解决方案1
2 已采纳 2011-03-05 23:32:00

解决方案2
2 2011-03-05 23:56:49