简体   繁体   English

处理urllib2中的异常并在Python中进行机械化

[英]Handling exceptions from urllib2 and mechanize in Python

I am a novice at using exception handling. 我是使用异常处理的新手。 I am using the mechanize module to scrape several websites. 我正在使用机械化模块抓取多个网站。 My program fails frequently because the connection is slow and because the requests timeout. 我的程序经常失败,因为连接速度很慢并且因为请求超时。 I would like to be able to retry the website (on a timeout, for instance) up to 5 times after 30 second delays between each try. 我希望能够在两次尝试之间延迟30秒后重试网站(例如,超时)最多5次。

I looked at this stackoverflow answer and can see how I can handle various exceptions. 我查看了这个 stackoverflow答案,可以看到我如何处理各种异常。 I also see (although it looks very clumsy) how I can put the try/exception inside a while loop to control the 5 attempts ... but I do not understand how to break out of the loop, or "continue" when the connection is successful and no exception has been thrown. 我也看到了(尽管看起来很笨拙)如何将try / exception放入while循环中以控制5次尝试...但是我不明白如何冲破循环,或者在连接时“继续”成功,并且没有引发异常。

from mechanize import Browser
import time

b = Browser()
tried=0
while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
    if isinstance(e,mechanize.HTTPError):
      print e.code
      tried += 1
      sleep(30)
      if tried > 4:
        exit()
    else:
      print e.reason.args
      tried += 1
      sleep(30)
      if tried > 4:
        exit()

print "How can I get to here after the first successful b.open() attempt????"

I would appreciate advice about (1) how to break out of the loop on a successful open, and (2) how to make the whole block less clumsy/more elegant. 我希望您能就以下方面提出建议:(1)如何在成功的打开过程中突围而出,以及(2)如何使整个积木不那么笨拙/更优雅。

Your first question can be done with break : 您的第一个问题可以用break完成:

while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
    break
  except #etc...

The real question, however, is do you really want to: this is what is known as "Spaghetti code": if you try to graph execution through the program, it looks like a plate of spaghetti. 但是,真正的问题是您是否真的想要:这就是所谓的“意大利面条代码”:如果您尝试通过程序对执行进行图形化处理,则看起来就像一盘意大利面条。

The real (imho) problem you are having, is that your logic for exiting the while loop is flawed. 您遇到的真正(imho)问题是退出while循环的逻辑存在缺陷。 Rather than trying to stop after a number of attempts (a condition that never occurs because you're already exiting anyway), loop until you've got a connection: 而不是尝试多次尝试后停止(这种情况永远不会发生,因为您已经退出了),而是循环直到获得连接为止:

#imports etc

tried=0
connected = False
while not Connected:
    try:
        r = b.open('http://www.google.com/foobar')
        connected = true # if line above fails, this is never executed
    except mechanize.HTTPError as e:
        print e.code            
        tried += 1        
        if tried > 4:
            exit() 
        sleep(30)

    except mechanize.URLError as e:
        print e.reason.args            
        tried += 1
        if tried > 4:
            exit()        
        sleep(30)

 #Do stuff

For your first question, you simply want the "break" keyword, which breaks out of a loop. 对于第一个问题,您只需要“ break”关键字,它可以脱离循环。

For the second question, you can have several "except" clauses for one "try", for different kinds of exceptions. 对于第二个问题,对于不同种类的异常,您可以为一个“ try”具有多个“ except”子句。 This replaces your isinstance() check and will make your code cleaner. 这将代替您的isinstance()检查,并使您的代码更整洁。

You don't have to repeat things in the except block that you do in either case. 您不必在两种情况下都在except块中重复执行任何操作。

from mechanize import Browser
import time

b = Browser()
tried=0
while True:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
      tried += 1
    if isinstance(e,mechanize.HTTPError):
      print e.code
    else:
      print e.reason.args
    if tried > 4:
      exit()
    sleep(30)
    continue
  break

Also, you may be able to use while not r: depending on what Browser.open returns. 另外,您可能可以while not r:使用while not r:情况下使用它while not r:具体取决于Browser.open返回的内容。

Edit: roadierich showed a more elegant way with 编辑: roadierich显示了一种更优雅的方式

try:
  doSomething()
  break
except:
  ...

Because an error skips to the except block. 因为错误跳到了except块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM