简体   繁体   English

在Google AppEngine上使用urllib2导致服务器错误

[英]Server error with using urllib2 on Google AppEngine

I am unsure why hosting this simple code on Google AppEngine returns a server error when any query is submitted to the form. 我不确定为什么在将任何查询提交到表单时在Google AppEngine上托管此简单代码都会返回服务器错误。 The problem seems to be with the line html = urllib2.urlopen(" http://google.com/search?q= " + q).read() as the code works fine without it. 问题似乎出在html = urllib2.urlopen(“ http://google.com/search?q= ” + q).read()这行,因为没有它,代码可以正常工作。

import webapp2
import urllib2


form="""
<form action="/process">
    <input name="q">
    <input type="submit">
</form>
"""


class MainHandler(webapp2.RequestHandler):
    def get(self):
        self.response.out.write(form)


class ProcessHandler(webapp2.RequestHandler):
    def get(self):
        q = self.request.get("q")
        html = urllib2.urlopen("http://google.com/search?q=" + q).read()
        self.response.out.write(html)


app = webapp2.WSGIApplication([('/', MainHandler),
                               ('/process', ProcessHandler)],
                               debug=True)

This is the error returned: 这是返回的错误:

Error: Server Error
The server encountered an error and could not complete your request.

If the problem persists, please report your problem and mention this error message and the query that caused it.

Probably www.google.com doesn't accept this kind of direct connections, canceling connections from a particular user agent. www.google.com可能不接受这种直接连接,因此会取消来自特定用户代理的连接。 In a simple python environment, you could change the user-agent string, but I think it's not possible to do that through google app engine. 在简单的python环境中,您可以更改用户代理字符串,但我认为无法通过Google App引擎来做到这一点。

Google is returning a 403 to your search string Google向您的搜索字符串返回403

>>> import urllib2
>>> html = urllib2.urlopen("http://google.com/search?q=Test").read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 442, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 629, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

This works however: 但是,这可行:

html = urllib2.urlopen(" http://google.com ").read() html = urllib2.urlopen(“ http://google.com ”).read()

So it looks like google are trying to stop this kind of searching. 因此,看来Google试图停止这种搜索。 As the other poster suggested, changing the User Agent string might stop the 403. Pick something common! 正如其他提示所建议的那样,更改用户代理字符串可能会使403停止运行。请选择一些常用的东西!

I've just tested with a Mozilla user agent set and I can get the results I think you are looking for 我刚刚使用Mozilla用户代理集进行了测试,可以得到我认为您正在寻找的结果

import urllib2
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://google.com/search?q=Test', None, headers)
html = urllib2.urlopen(req).read()
print html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM