简体   繁体   English

如何使用网址提取(python)捕获所有可能的错误?

[英]How do I catch all possible errors with url fetch (python)?

In my application users enter a url and I try to open the link and get the title of the page. 在我的应用程序用户中输入一个URL,然后我尝试打开链接并获取页面标题。 But I realized that there can be many different kinds of errors, including unicode characters or newlines in titles and AttributeError and IOError . 但是我意识到可能有很多不同类型的错误,包括标题中的unicode字符或换行符以及AttributeErrorIOError I first tried to catch each error, but now in case of a url fetch error I want to redirect to an error page where the user will enter the title manually. 我首先尝试捕获每个错误,但是现在如果出现url提取错误,我想重定向到错误页面,用户将在其中手动输入标题。 How do I catch all possible errors? 如何捕获所有可能的错误? This is the code I have now: 这是我现在拥有的代码:

    title = "title"

    try:

        soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
        title = str(soup.html.head.title.string)

        if title == "404 Not Found":
            self.redirect("/urlparseerror")
        elif title == "403 - Forbidden":
            self.redirect("/urlparseerror")     
        else:
            title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

    except UnicodeDecodeError:    
        self.redirect("/urlparseerror?error=UnicodeDecodeError")

    except AttributeError:        
        self.redirect("/urlparseerror?error=AttributeError")

    #https url:    
    except IOError:        
        self.redirect("/urlparseerror?error=IOError")


    #I tried this else clause to catch any other error
    #but it does not work
    #this is executed when none of the errors above is true:
    #
    #else:
    #    self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")

UPDATE UPDATE

As suggested by @Wooble in the comments I added try...except while writing the title to database: 正如@Wooble在评论中所建议的那样,我添加了try...except但将title写入数据库时​​:

        try:
            new_item = Main(
                        ....
                        title = unicode(title, "utf-8"))

            new_item.put()

        except UnicodeDecodeError:    

            self.redirect("/urlparseerror?error=UnicodeDecodeError")

This works. 这可行。 Although the out-of-range character — is still in title according to the logging info: 尽管根据日志记录信息,超出范围的字符—仍在title

***title: 7.2. re — Regular expression operations — Python v2.7.1 documentation**

Do you know why? 你知道为什么吗?

You can use except without specifying any type to catch all exceptions. 您可以使用except,而无需指定任何类型来捕获所有异常。

From the python docs http://docs.python.org/tutorial/errors.html : 从python docs http://docs.python.org/tutorial/errors.html中

import sys

try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as (errno, strerror):
    print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise

The last except will catch any exception that has not been caught before (ie a Exception which is not of IOError or ValueError.) 最后一个除外将捕获之前未捕获的任何异常(即,不是IOError或ValueError的异常。)

You can use the top level exception type Exception, which will catch any exception that has not been caught before. 您可以使用顶级异常类型Exception,它将捕获之前未捕获的任何异常。

http://docs.python.org/library/exceptions.html#exception-hierarchy http://docs.python.org/library/exceptions.html#exception-hierarchy

try:

    soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
    title = str(soup.html.head.title.string)

    if title == "404 Not Found":
        self.redirect("/urlparseerror")
    elif title == "403 - Forbidden":
        self.redirect("/urlparseerror")     
    else:
        title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

except UnicodeDecodeError:    
    self.redirect("/urlparseerror?error=UnicodeDecodeError")

except AttributeError:        
    self.redirect("/urlparseerror?error=AttributeError")

#https url:    
except IOError:        
    self.redirect("/urlparseerror?error=IOError")

except Exception, ex:
    print "Exception caught: %s" % ex.__class__.__name__

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM