简体   繁体   English

在Python 2.4中超时urllib2 urlopen操作

[英]Timing out urllib2 urlopen operation in Python 2.4

I've just inherited some Python code and need to fix a bug as soon as possible. 我刚刚继承了一些Python代码,需要尽快修复错误。 I have very little Python knowledge so please excuse my ignorance. 我对Python的了解很少,请原谅我的无知。 I am using urllib2 to extract data from web pages. 我正在使用urllib2从网页提取数据。 Despite using socket.setdefaulttimeout(30) I am still coming across URLs that hang seemingly indefinitely. 尽管使用了socket.setdefaulttimeout(30)但我仍然遇到似乎无限期挂起的URL。

I want to time out the extraction and have got this far after much searching the web: 我想暂停提取,并且在网上搜索了很多之后才知道:

import socket 
socket.setdefaulttimeout(30)

reqdata = urllib2.Request(urltocollect)

    def handler(reqdata):
        ????  reqdata.close() ????


    t = Timer(5.0, handler,[reqdata])
    t.start()
    urldata = urllib2.urlopen(reqdata)
    t.cancel()

The handler function triggers after the time has passed but I don't know how to get it to stop the openurl operation. 时间过去后触发处理程序函数,但我不知道如何获取它来停止openurl操作。

Any guidance would be gratefully received. 任何指导将不胜感激。 C C

UPDATE ------------------------- In my experience when used on certain URLs urllib2.urlopen hangs and waits indefinitely. 更新-------------------------以我的经验,在某些URL上使用urllib2.urlopen会挂起并无限期地等待。 The URLs that do this are ones that when pointed to with a browser never resolve, the browser just waits with the activity indicator moving but never connecting fully. 执行此操作的URL是用浏览器指向时从未解析的URL,浏览器仅在活动指示器移动时等待,但从未完全连接。 I suspect that these URLs may be stuck inside some kind of infinite looping redirect. 我怀疑这些URL可能卡在某种无限循环重定向中。 The timeout argument to urlopen (in later versions of Python) and the socket.setdefaulttimeout() global setting do not detect this issue on my system. urlopen的timeout参数(在更高版本的Python中)和socket.setdefaulttimeout()全局设置在我的系统上未检测到此问题。

I tried a number of solutions but in the end I updraded to Python 2.7 and used a variation of Werner's answer below. 我尝试了多种解决方案,但最终我升级到了Python 2.7,并在下面使用了Werner的答案的变体。 Thanks Werner. 谢谢沃纳。

It's right there in the function . 在函数中

urllib2.urlopen(url[, data][, timeout])

eg: 例如:

urllib2.urlopen("www.google.com", data, 5)

You can achieve this using signals. 您可以使用信号来实现。

Here's an example of my signal decorator that you can use to set the timeout for individual functions. 这是我的信号装饰器的示例,可用于设置各个功能的超时时间。

Ps. 附言 not sure if this is syntactically correct for 2.4. 不知道这在语法上对2.4是否正确。 I'm using 2.6 but the 2.4 supports signals. 我使用的是2.6,但2.4支持信号。

import signal
import time

class TimeOutException(Exception):
    pass

def timeout(seconds, *args, **kwargs):
    def fn(f):
        def wrapped_fn(*args, **kwargs):
            signal.signal(signal.SIGALRM, handler)
            signal.alarm(seconds)
            f(*args, **kwargs)
        return wrapped_fn
    return fn

def handler(signum, frame):
    raise TimeOutException("Timeout")

@timeout(5)
def my_function_that_takes_long(time_to_sleep):
    time.sleep(time_to_sleep)

if __name__ == '__main__':
    print 'Calling function that takes 2 seconds'
    try:
        my_function_that_takes_long(2)
    except TimeOutException:
        print 'Timed out'

    print 'Calling function that takes 10 seconds'
    try:
        my_function_that_takes_long(10)
    except TimeOutException:
        print 'Timed out'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM