简体   繁体   中英

Scrapy proxy ip does not work with https, returns 'ssl handshake failure'

Scrapy works with my proxy ip for an http request but not for an http s request.

I know my proxy IP is working with http because I test it by sending a request to http://ipinfo.io/ip :

2016-03-28 12:10:42 [scrapy] DEBUG: Crawled (200) <GET http://ipinfo.io/ip> (referer: http://www.google.com)
2016-03-28 12:10:42 [root] INFO:  *** TEST, WHAT IS MY IP: ***
107.183.7.XX

I know its not working with an https request because of this error message:

2016-03-28 12:10:55 [scrapy] DEBUG: Gave up retrying <GET https://www.my-company-url.com> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl23_read', 'ssl handshake failure')]>]

My settings.py contains:

DOWNLOADER_MIDDLEWARES = {
    'crystalball.middlewares.ProxyMiddleware': 100,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110
}

My crystalball.middlewares.ProxyMiddleware contains:

import base64

class ProxyMiddleware(object):

    def process_request(self, request, spider):
        request.meta['proxy'] = "https://107.183.X.XX:55555"
        proxy_user_pass = "hXXbp3:LitSwDXX99"
        encoded_user_pass = base64.encodestring(proxy_user_pass)
        request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

Any suggestions on what I should experiment with next?

Side note: The solutions on this SO post have not worked: Scrapy and proxies .

The culprit is base64.encodestring() , which adds an unwanted new line \\n character into the value of the Proxy-Authorization header of the request.

The solution was simply to strip() off that \\n .

Change this line:

request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

To this:

request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass.strip()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM