简体   繁体   English

使用 puppeteer 通过 Headless Chrome 绕过 CAPTCHA

[英]Bypassing CAPTCHAs with Headless Chrome using puppeteer

google finds my browser is being manipulated/controlled/automated by software, and because of that I get reCaptcha . google发现我的浏览器正在被软件manipulated/controlled/automated ,因此我得到了reCaptcha When I manual start chromium and do the same steps the reCaptcha doesn't appear.当我手动启动铬并执行相同的步骤时,不会出现 reCaptcha。

Question 1)

Is it possible to solve captcha Programmatically or get rid of it when using puppeteer ?在使用puppeteer时,是否可以以编程方式解决验证码或摆脱它? Any way to solve this?有什么办法可以解决这个问题?

Question 2)

Does this happens only when without headless option ie仅当没有headless选项时才会发生这种情况, ie

const browser = await puppeteer.launch({
  headless: false
})

OR this is something the fact we have to accept and move on?或者这是我们必须接受并继续前进的事实?

Try generating random useragent using this npm package .尝试使用此npm 包生成随机用户代理。 This usually solves the user agent-based protection.这通常可以解决基于用户代理的保护。

In puppeteer pages can override browser user agent with page.setUserAgentpage.setUserAgent页面中可以使用page.setUserAgent覆盖浏览器用户代理

var userAgent = require('user-agents');
...
await page.setUserAgent(userAgent.toString())

Additionally, you can add these two extra plugins,此外,您可以添加这两个额外的插件,

puppeteer-extra-plugin-recaptcha - Solves reCAPTCHAs automatically, using a single line of code: page.solveRecaptchas() puppeteer-extra-plugin-recaptcha - 使用一行代码自动解决 reCAPTCHA: page.solveRecaptchas()

NOTE: puppeteer-extra-plugin-recaptcha uses a paid service 2captcha注意: puppeteer-extra-plugin-recaptcha使用付费服务2captcha

puppeteer-extra-plugin-stealth - Applies various evasion techniques to make detection of headless puppeteer harder.puppeteer-extra-plugin-stealth - 应用各种规避技术使无头傀儡的检测更加困难。

Here is a list of things I'm doing to bypass the captchas and similar blockings:以下是我为绕过验证码和类似阻止而正在做的事情的列表:

  • Enable stealth mode (via puppeteer-extra-plugin-stealth)启用隐身模式(通过 puppeteer-extra-plugin-stealth)
  • Randomize User-agent or Set a valid one (via random-useragent)随机化用户代理或设置一个有效的(通过随机用户代理)
  • Randomize Viewport size随机化视口大小
  • Skip images/styles/fonts loading for better performance跳过图像/样式/字体加载以获得更好的性能
  • Pass "WebDriver check"通过“WebDriver 检查”
  • Pass "Chrome check"通过“Chrome 检查”
  • Pass "Notifications check"通过“通知检查”
  • Pass "Plugins check"通过“插件检查”
  • Pass "Languages check"通过“语言检查”

Link to full code is here完整代码的链接在这里

 const randomUseragent = require('random-useragent'); //Enable stealth mode const puppeteer = require('puppeteer-extra') const StealthPlugin = require('puppeteer-extra-plugin-stealth') puppeteer.use(StealthPlugin()) const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36'; async function createPage (browser,url) { //Randomize User agent or Set a valid one const userAgent = randomUseragent.getRandom(); const UA = userAgent || USER_AGENT; const page = await browser.newPage(); //Randomize viewport size await page.setViewport({ width: 1920 + Math.floor(Math.random() * 100), height: 3000 + Math.floor(Math.random() * 100), deviceScaleFactor: 1, hasTouch: false, isLandscape: false, isMobile: false, }); await page.setUserAgent(UA); await page.setJavaScriptEnabled(true); await page.setDefaultNavigationTimeout(0); //Skip images/styles/fonts loading for performance await page.setRequestInterception(true); page.on('request', (req) => { if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){ req.abort(); } else { req.continue(); } }); await page.evaluateOnNewDocument(() => { // Pass webdriver check Object.defineProperty(navigator, 'webdriver', { get: () => false, }); }); await page.evaluateOnNewDocument(() => { // Pass chrome check window.chrome = { runtime: {}, // etc. }; }); await page.evaluateOnNewDocument(() => { //Pass notifications check const originalQuery = window.navigator.permissions.query; return window.navigator.permissions.query = (parameters) => ( parameters.name === 'notifications' ? Promise.resolve({ state: Notification.permission }) : originalQuery(parameters) ); }); await page.evaluateOnNewDocument(() => { // Overwrite the `plugins` property to use a custom getter. Object.defineProperty(navigator, 'plugins', { // This just needs to have `length > 0` for the current test, // but we could mock the plugins too if necessary. get: () => [1, 2, 3, 4, 5], }); }); await page.evaluateOnNewDocument(() => { // Overwrite the `languages` property to use a custom getter. Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'], }); }); await page.goto(url, { waitUntil: 'networkidle2',timeout: 0 } ); return page; }

您是否尝试过设置浏览器代理?

await page.setUserAgent('5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36');

After a few tests, a couple of packages helped me avoid recaptcha:经过几次测试,几个包帮助我避免了重新验证:

//const puppeteer = require('puppeteer');
const puppeteerExtra = require('puppeteer-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
const randomUseragent = require('random-useragent');

class PuppeteerService {

    constructor() {
        this.browser = null;
        this.page = null;
        this.pageOptions = null;
        this.waitForFunction = null;
        this.isLinkCrawlTest = null;
    }

    async initiate(countsLimitsData, isLinkCrawlTest) {
        this.pageOptions = {
            waitUntil: 'networkidle2',
            timeout: countsLimitsData.millisecondsTimeoutSourceRequestCount
        };
        this.waitForFunction = 'document.querySelector("body")';
        puppeteerExtra.use(pluginStealth());
        //const browser = await puppeteerExtra.launch({ headless: false });
        this.browser = await puppeteerExtra.launch({ headless: false });
        this.page = await this.browser.newPage();
        await this.page.setRequestInterception(true);
        this.page.on('request', (request) => {
            if (['image', 'stylesheet', 'font', 'script'].indexOf(request.resourceType()) !== -1) {
                request.abort();
            } else {
                request.continue();
            }
        });
        this.isLinkCrawlTest = isLinkCrawlTest;
    }

    async crawl(link) {
        const userAgent = randomUseragent.getRandom();
        const crawlResults = { isValidPage: true, pageSource: null };
        try {
            await this.page.setUserAgent(userAgent);
            await this.page.goto(link, this.pageOptions);
            await this.page.waitForFunction(this.waitForFunction);
            crawlResults.pageSource = await this.page.content();
        }
        catch (error) {
            crawlResults.isValidPage = false;
        }
        if (this.isLinkCrawlTest) {
            this.close();
        }
        return crawlResults;
    }

    close() {
        if (!this.browser) {
            this.browser.close();
        }
    }
}

const puppeteerService = new PuppeteerService();
module.exports = puppeteerService;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM