heroku 上的机器人 - 由于验证码无法废弃网站，即使我的电脑上一切正常

Question

I have simple bot on heroku which works with discord and scraps sites.我在 heroku 上有一个简单的机器人，它与 discord 和废弃站点一起工作。 Normally i use reuqests module to scrap sites, i get page source and that's all.通常我使用reuqests模块来废弃网站，我得到页面源代码，仅此而已。 (note: bot doesn't spam ping sites, only once per day/week, also site i'm pinging is epicgames, but it's not the only one with captcha) . （注意：机器人不会发送垃圾邮件 ping 站点，每天/每周只发送一次，我正在 ping 的站点也是史诗游戏，但它不是唯一一个带有验证码的站点） 。

But later i discovered that i get captcha protection in my page source, so i decided to use chromedriver.但后来我发现我的页面源代码中有验证码保护，所以我决定使用 chromedriver。 After setting up chromedriver on heroku, i still got captcha protection on sites.在 heroku 上设置 chromedriver 后，我仍然在网站上获得验证码保护。 On my pc it worked completely fine even without any options below, it never asked for captcha verification.在我的电脑上，即使没有以下任何选项，它也能正常工作，它从未要求验证码验证。

So this is what i tried: (note: i use undetected chromedriver - optimized version of selenium chromedriver)所以这就是我尝试的：（注意：我使用未检测到的 chromedriver - selenium chromedriver 的优化版本）

1. In page source it asked for JavaScript to be enabled, so i added chromedriver option 1.在页面源代码中，它要求启用JavaScript ，所以我添加了 chromedriver 选项

import undetected_chromedriver as webdriver

opts = uc.ChromeOptions()
opts.add_argument("--enable-javascript")
driver = uc.Chrome(use_subprocess=True, options=opts)

driver.get(url)
print(driver.page_source)

Still showed captcha verification, but now without JavaScript error.仍然显示验证码验证，但现在没有 JavaScript 错误。

2. After doing some research, i discovered heroku IP might be on some sort of block list so i was suggested to add proxy to chromedriver options 2.在做了一些研究之后，我发现 heroku IP 可能在某种阻止列表中，所以建议我将代理添加到 chromedriver 选项

import undetected_chromedriver as webdriver

opts = uc.ChromeOptions()
opts.add_argument("--enable-javascript")
opts.add_argument(f'--proxy-server=socks5://hostip:port')
driver = uc.Chrome(use_subprocess=True, options=opts)

driver.get(url)
print(driver.page_source)

3. I found similar option to the second one which seemed to work for other, but still site showed captcha verification 3.我发现与第二个类似的选项似乎适用于其他选项，但站点仍然显示验证码

import undetected_chromedriver as webdriver
import os
import shutil
import tempfile

class ProxyExtension:
    manifest_json = """
    {
        "version": "1.0.0",
        "manifest_version": 2,
        "name": "Chrome Proxy",
        "permissions": [
            "proxy",
            "tabs",
            "unlimitedStorage",
            "storage",
            "<all_urls>",
            "webRequest",
            "webRequestBlocking"
        ],
        "background": {"scripts": ["background.js"]},
        "minimum_chrome_version": "76.0.0"
    }
    """

    background_js = """
    var config = {
        mode: "fixed_servers",
        rules: {
            singleProxy: {
                scheme: "http",
                host: "%s",
                port: %d
            },
            bypassList: ["localhost"]
        }
    };

    chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

    function callbackFn(details) {
        return {
            authCredentials: {
                username: "%s",
                password: "%s"
            }
        };
    }

    chrome.webRequest.onAuthRequired.addListener(
        callbackFn,
        { urls: ["<all_urls>"] },
        ['blocking']
    );
    """

    def __init__(self, host, port, user, password):
        self._dir = os.path.normpath(tempfile.mkdtemp())

        manifest_file = os.path.join(self._dir, "manifest.json")
        with open(manifest_file, mode="w") as f:
            f.write(self.manifest_json)

        background_js = self.background_js % (host, port, user, password)
        background_file = os.path.join(self._dir, "background.js")
        with open(background_file, mode="w") as f:
            f.write(background_js)

    @property
    def directory(self):
        return self._dir

    def __del__(self):
        shutil.rmtree(self._dir)


if __name__ == "__main__":
    proxy = ("hostip", port, "username", "pass")
    proxy_extension = ProxyExtension(*proxy)

    options = uc.ChromeOptions()
    options.add_argument("--enable-javascript")
    options.add_argument(f"--load-extension={proxy_extension.directory}")
    driver = uc.Chrome(use_subprocess=True, options=options)

Also i've tried options like adding --headless option, changing agent to firefox, adding nogpu option and etc.我也尝试过添加 --headless 选项、将代理更改为 firefox、添加 nogpu 选项等选项。

I've been trying to fix this for a month, now I hope someone knows answer to my problem.我一直在尝试解决这个问题一个月，现在我希望有人知道我的问题的答案。

Answer 1

You are likely receiving the captcha due to Heroku having a datacenter ip and probably being flagged or something similar.您可能会收到验证码，因为 Heroku 具有数据中心 ip 并且可能被标记或类似的东西。 You have a couple of options you could try using a residential proxy and hope its not flagged and you don't get a captcha or you could pay for a captcha solution like 2Captcha or Capmonster .您有几个选项可以尝试使用住宅代理并希望它没有被标记并且您没有获得验证码，或者您可以支付验证码解决方案，如2Captcha或Capmonster 。 Not sure exactly what type of captcha you are getting but both support reCaptcha.不确定您获得的是哪种类型的验证码，但两者都支持 reCaptcha。 The 2Captcha Docs have a lot of good information for submitting the captcha once you solve it. 2Captcha Docs有很多很好的信息，用于在您解决验证码后提交验证码。

heroku 上的机器人 - 由于验证码无法废弃网站，即使我的电脑上一切正常

问题描述

1 个解决方案

解决方案1
0 2022-08-18 16:50:11

heroku 上的机器人 - 由于验证码无法废弃网站，即使我的电脑上一切正常

问题描述

1 个解决方案

解决方案1 0 2022-08-18 16:50:11

解决方案1
0 2022-08-18 16:50:11