[英]python timer + urllib2 code errors
I am trying to pull information from a site ever 5 seconds but it doesn't seem to be working and I get errors every time I run it. 我试图每5秒钟从一个站点提取信息,但是它似乎不起作用,并且每次运行它都会出错。
Code below: 代码如下:
import urllib2, threading
def readpage():
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/'[1])
print(a.split('">')[0])
t = threading.Timer(5.0, readpage)
t.start()
I get these errors: 我得到这些错误:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 808, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 1080, in run
self.function(*self.args, **self.kwargs)
File "C:\Users\Jordan\Desktop\username.py", line 3, in readpage
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').rea
()
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
Help would be appreciated, thanks! 帮助将不胜感激,谢谢!
Did you try opening that url without the thread? 您是否尝试在没有线程的情况下打开该网址? The error code says 403: Forbidden, maybe you need authentication for that web page. 错误代码为403:禁止,可能您需要对该网页进行身份验证。
This has nothing to do with Python -- the server is denying your requests to that URL. 这与Python无关-服务器拒绝您对该URL的请求。
I suspect that either the URL is incorrect or you've hit some kind of rate limiting and are being blocked. 我怀疑该URL不正确,或者您遇到了某种速率限制而被阻止。
The site is blocking Python's User-Agent
. 该站点阻止了Python的User-Agent
。 Try this: 尝试这个:
import urllib2, threading
def readpage():
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://forums.zybez.net/runescape-2007-prices', None, headers)
data = urllib2.urlopen(req).read()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/'[1])
print(a.split('">')[0])
The site is rejecting the default User-Agent reported by urllib2. 该站点拒绝urllib2报告的默认User-Agent。 You can change it for all requests in the script using install_opener. 您可以使用install_opener对脚本中的所有请求进行更改。
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0')]
urllib2.install_opener(opener)
You'll also need to split the data from by the site to read it line by line 您还需要按站点拆分数据以逐行读取
urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read().splitlines()
and change 并改变
line.split('/runescape-2007-prices/player/'[1])
to 至
line.split('/runescape-2007-prices/player/')[1]
Working: 工作方式:
import urllib2, threading
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0')]
urllib2.install_opener(opener)
def readpage():
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read().splitlines()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/')[1]
print(a.split('">')[0])
t = threading.Timer(5.0, readpage)
t.start()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.