简体   繁体   English

TypeError:期望的httplib.Message,得到了 <type 'instance'> 。 在GAE上使用requests.get(url)时

[英]TypeError: expected httplib.Message, got <type 'instance'>. when using requests.get(url) on GAE

My aim is to build a web crawler and host it on GAE. 我的目标是构建一个Web爬虫并在GAE上托管它。 However,when I try to execute a very basic implementation I get the following error: 但是,当我尝试执行一个非常基本的实现时,我收到以下错误:

    Traceback (most recent call last):
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2-2.5.2\webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "E:\WSE_NewsClusteriing\crawler\crawler.py", line 14, in get
    source_code = requests.get(url)
  File "libs\requests\api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "libs\requests\api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "libs\requests\sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "libs\requests\sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "libs\requests\adapters.py", line 376, in send
    timeout=timeout
  File "libs\requests\packages\urllib3\connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "libs\requests\packages\urllib3\connectionpool.py", line 390, in _make_request
    assert_header_parsing(httplib_response.msg)
  File "libs\requests\packages\urllib3\util\response.py", line 49, in assert_header_parsing
    type(headers)))
TypeError: expected httplib.Message, got <type 'instance'>.

My main.py is as follows: 我的main.py如下:

import sys
sys.path.insert(0, 'libs')

import webapp2
import requests
from bs4 import BeautifulSoup

class MainPage(webapp2.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        url = 'http://www.bbc.com/news/world'
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class': 'title-link'}):
            href = 'http://www.bbc.com' + link.get('href')
            self.response.write(href)


app = webapp2.WSGIApplication([
    ('/', MainPage),
], debug=True)

The thing is that the crawler works fine as a standalone python application. 问题是爬虫作为一个独立的python应用程序工作正常。

Can someone help me figure out what's wrong here? 有人能帮我弄清楚这里有什么问题吗? Does the requests module cause some compatibility issues with GAE? 请求模块是否会导致与GAE的一些兼容性问题?

I would advise against using the requests library on App Engine for the time being as it is not officially supported. 我建议暂时不要在App Engine上使用requests库,因为它没有得到官方支持。 It is therefore very likely to encounter compatibility issues. 因此很可能遇到兼容性问题。 As per the URL Fetch Python API article, supported libraries include urllib , urllib2 , httplib and using urlfetch directly. 根据URL Fetch Python API文章,支持的库包括urlliburllib2httplib和直接使用urlfetch Some features of the requests library may also be based on the urllib3 library given their collaboration . requests库的某些功能也可能基于urllib3库,因为它们的协作 This library is also not yet supported. 此库尚不支持。

Feel free to consult the URL Fetch for simple examples of urllib2 and urlfetch requests. 有关urllib2urlfetch请求的简单示例,请随时查阅URL Fetch If there's some way that these libraries are not working for you, feel free to point us as such in your question. 如果这些图书馆不适合您的某种方式,请随时在您的问题中指出我们。

This is almost two years old question but I actually stumbled upon this on appengine just now. 这是近两年的问题,但我实际上偶然发现了这个问题。 For the benefit of those who may come across similar issue, docs describes how to issue HTTP(S) requests 为了那些可能遇到类似问题的人的利益,docs描述了如何发出HTTP(S)请求

import requests
import requests_toolbelt.adapters.appengine

# Use the App Engine Requests adapter. This makes sure that Requests uses
# URLFetch.
requests_toolbelt.adapters.appengine.monkeypatch()

Referance https://cloud.google.com/appengine/docs/standard/python/issue-requests Referance https://cloud.google.com/appengine/docs/standard/python/issue-requests

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM