简体   繁体   English

Python3,Urllib.request,urlopen()超时

[英]Python3, Urllib.request, urlopen() is timing out

I'm using urlopen() to open a website and pull (financial) data from it. 我正在使用urlopen()打开网站并从中提取(财务)数据。 Here is my line: 这是我的台词:

sourceCode = urlopen('xxxxxxxx').read()

After this, I then pull the data I need out. 之后,我将需要的数据提取出来。 I loop through different pages on the same domain to pull data (stock info). 我遍历同一域中的不同页面以提取数据(股票信息)。 I end the body of the loop with: 我以以下内容结束循环的主体:

time.sleep(1)

as I'm told that keeps the site from blocking me. 有人告诉我,该网站不会阻止我。 My program will run for a few minutes, but at some point, it stalls and quits pulling data. 我的程序将运行几分钟,但有时会停止并退出提取数据。 I can rerun it and it'll run another arbitrary amount of time and then stall. 我可以重新运行它,它将再运行任意时间,然后停顿。

Is there something I can do to prevent this? 有什么我可以防止的事情吗?

This worked (for most websites) for me: 这对我来说(对大多数网站而言)有效:

If you're using the urllib.request library, you can create a Request and spoof the user agent. 如果您使用的是urllib.request库,则可以创建一个Request并欺骗用户代理。 This might mean that they stop blocking you. 这可能意味着他们不再阻止您。

from urllib.request import Request, urlopen
req = Request(path, headers={'User-Agent': 'Mozilla/5.0})
data = urlopen(req).read()

Hope this helps 希望这可以帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM