简体   繁体   English

Python 请求无法与代理和 https 页面一起正常工作

[英]Python requests not working properly with proxies and https pages

I am trying to write a scraper in python using requests with proxies to scrape a https page.我正在尝试使用带有代理的请求在 python 中编写一个抓取器来抓取 https 页面。 I found lists of free proxies on the internet and manually validated a bunch of them in an online proxy-checker.我在互联网上找到了免费代理列表,并在在线代理检查器中手动验证了其中的一堆。 I also made sure to use only proxies that support https according to the website.我还确保根据网站只使用支持 https 的代理。 But in python nearly all of them fail for http pages and ALL of them do not work for my desired https page.但是在 python 中,几乎所有的 http 页面都失败了,而且所有这些都不适用于我想要的 https 页面。 I did everythin according to the tutorials I found and I am running out of ideas what could possibly be the issue.我根据我找到的教程做了所有事情,但我想不出可能是什么问题。 I plan to look into the actual error messages without the try/except today, but I hoped someone could tell me if the code is valid in the first place.我打算在没有 try/except 今天的情况下查看实际的错误消息,但我希望有人能首先告诉我代码是否有效。

    def proxy_json_test_saved_proxies(self):
        test_count = 1
        timeout_seconds = 10
        working_http = 0
        working_https = 0
        for proxy_dict in self.all_proxies:
            print("#######")
            print("Testing http proxy " + str(test_count) + "/" + str(len(self.all_proxies)))
            test_count += 1
            proxy = {'http':'http://' + proxy_dict["address"],
                        'https':'https://' + proxy_dict["address"]
                    }
            print(proxy)
            print("Try http connection:")
            try:
                requests.get("http://example.com", proxies = proxy, timeout = timeout_seconds)
            except IOError:
                print("Fail")
            else:
                print("Success")
                working_http += 1

            print("Try https connection:")
            try:
                requests.get("https://example.com", proxies = proxy, timeout = timeout_seconds)
            except IOError:
                print("Fail")
            else:
                print("Success")
                working_https += 1
            print("Working http: ", working_http)
            print("Working https: ", working_https)

proxy_dict["address"] contains ip:port values like "185.247.177.27:80". proxy_dict["address"] 包含 ip:port 值,如“185.247.177.27:80”。 self.all_proxies is a list of about 100 of those proxy_dicts. self.all_proxies 是大约 100 个 proxy_dict 的列表。

I also know, that these free proxies might often times be already occupied.我也知道,这些免费代理可能经常已经被占用。 Thus I repeated the routine multiple times without ANY of them working for https and no real improvement in the http-count either.因此,我多次重复该例程,但其中任何一个都没有为 https 工作,并且 http-count 也没有真正的改进。

me again.又是我。 Solved the issue and wanted to post the answer.解决了问题并想发布答案。 In the end it was just a typo in the proxy definition.最后,这只是代理定义中的错字。 The proxy server is reached via http, no matter if the goal url uses http or https.代理服务器是通过 http 到达的,无论目标 url 使用 http 还是 https。

I changed this:我改变了这个:

proxy = {'http':'http://' + proxy_dict["address"],
         'https':'https://' + proxy_dict["address"]
        }

To this (deleted the "s" in https string):为此(删除了 https 字符串中的“s”):

proxy = {'http':'http://' + proxy_dict["address"],
         'https':'http://' + proxy_dict["address"]
        }

And now it works.现在它起作用了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM