[英]Python3, Urllib.request, urlopen() is timing out
I'm using urlopen() to open a website and pull (financial) data from it. 我正在使用urlopen()打开网站并从中提取(财务)数据。 Here is my line:
这是我的台词:
sourceCode = urlopen('xxxxxxxx').read()
After this, I then pull the data I need out. 之后,我将需要的数据提取出来。 I loop through different pages on the same domain to pull data (stock info).
我遍历同一域中的不同页面以提取数据(股票信息)。 I end the body of the loop with:
我以以下内容结束循环的主体:
time.sleep(1)
as I'm told that keeps the site from blocking me. 有人告诉我,该网站不会阻止我。 My program will run for a few minutes, but at some point, it stalls and quits pulling data.
我的程序将运行几分钟,但有时会停止并退出提取数据。 I can rerun it and it'll run another arbitrary amount of time and then stall.
我可以重新运行它,它将再运行任意时间,然后停顿。
Is there something I can do to prevent this? 有什么我可以防止的事情吗?
This worked (for most websites) for me: 这对我来说(对大多数网站而言)有效:
If you're using the urllib.request library, you can create a Request and spoof the user agent. 如果您使用的是urllib.request库,则可以创建一个Request并欺骗用户代理。 This might mean that they stop blocking you.
这可能意味着他们不再阻止您。
from urllib.request import Request, urlopen
req = Request(path, headers={'User-Agent': 'Mozilla/5.0})
data = urlopen(req).read()
Hope this helps 希望这可以帮助
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.