设置超时以使用python lxml解析网页

Question

I am using python lxml library to parse html pages: 我使用python lxml库来解析html页面：

import lxml.html

# this might run indefinitely
page = lxml.html.parse('http://stackoverflow.com/')

Is there any way to set timeout for parsing? 有没有办法设置解析超时？

Answer 1

It looks to be using urllib.urlopen as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler. 它看起来使用urllib.urlopen作为开启者，但最简单的方法是修改套接字处理程序的默认超时。

import socket
timeout = 10
socket.setdefaulttimeout(timeout)

Of course this is a quick-and-dirty solution. 当然，这是一个快速而肮脏的解决方案。

设置超时以使用python lxml解析网页

问题描述

1 个解决方案

解决方案1
1 已采纳 2010-05-05 02:55:57

设置超时以使用python lxml解析网页

问题描述

1 个解决方案

解决方案1 1 已采纳 2010-05-05 02:55:57

解决方案1
1 已采纳 2010-05-05 02:55:57