简体   繁体   English

设置超时以使用python lxml解析网页

[英]Setting timeouts to parse webpages using python lxml

I am using python lxml library to parse html pages: 我使用python lxml库来解析html页面:

import lxml.html

# this might run indefinitely
page = lxml.html.parse('http://stackoverflow.com/')

Is there any way to set timeout for parsing? 有没有办法设置解析超时?

It looks to be using urllib.urlopen as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler. 它看起来使用urllib.urlopen作为开启者,但最简单的方法是修改套接字处理程序的默认超时。

import socket
timeout = 10
socket.setdefaulttimeout(timeout)

Of course this is a quick-and-dirty solution. 当然,这是一个快速而肮脏的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM