简体   繁体   中英

TypeError: expected httplib.Message, got <type 'instance'>. when using requests.get(url) on GAE

My aim is to build a web crawler and host it on GAE. However,when I try to execute a very basic implementation I get the following error:

    Traceback (most recent call last):
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "E:\WSE_NewsClusteriing\crawler\crawler.py", line 14, in get
    source_code = requests.get(url)
  File "libs\requests\api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "libs\requests\api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "libs\requests\sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "libs\requests\sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "libs\requests\adapters.py", line 376, in send
    timeout=timeout
  File "libs\requests\packages\urllib3\connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "libs\requests\packages\urllib3\connectionpool.py", line 390, in _make_request
    assert_header_parsing(httplib_response.msg)
  File "libs\requests\packages\urllib3\util\response.py", line 49, in assert_header_parsing
    type(headers)))
TypeError: expected httplib.Message, got <type 'instance'>.

My main.py is as follows:

import sys
sys.path.insert(0, 'libs')

import webapp2
import requests
from bs4 import BeautifulSoup

class MainPage(webapp2.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        url = 'http://www.bbc.com/news/world'
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class': 'title-link'}):
            href = 'http://www.bbc.com' + link.get('href')
            self.response.write(href)


app = webapp2.WSGIApplication([
    ('/', MainPage),
], debug=True)

The thing is that the crawler works fine as a standalone python application.

Can someone help me figure out what's wrong here? Does the requests module cause some compatibility issues with GAE?

I would advise against using the requests library on App Engine for the time being as it is not officially supported. It is therefore very likely to encounter compatibility issues. As per the URL Fetch Python API article, supported libraries include urllib , urllib2 , httplib and using urlfetch directly. Some features of the requests library may also be based on the urllib3 library given their collaboration . This library is also not yet supported.

Feel free to consult the URL Fetch for simple examples of urllib2 and urlfetch requests. If there's some way that these libraries are not working for you, feel free to point us as such in your question.

This is almost two years old question but I actually stumbled upon this on appengine just now. For the benefit of those who may come across similar issue, docs describes how to issue HTTP(S) requests

import requests
import requests_toolbelt.adapters.appengine

# Use the App Engine Requests adapter. This makes sure that Requests uses
# URLFetch.
requests_toolbelt.adapters.appengine.monkeypatch()

Referance https://cloud.google.com/appengine/docs/standard/python/issue-requests

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM