簡體   English   中英

使用 puppeteer 通過 Headless Chrome 繞過 CAPTCHA

[英]Bypassing CAPTCHAs with Headless Chrome using puppeteer

google發現我的瀏覽器正在被軟件manipulated/controlled/automated ,因此我得到了reCaptcha 當我手動啟動鉻並執行相同的步驟時,不會出現 reCaptcha。

Question 1)

在使用puppeteer時,是否可以以編程方式解決驗證碼或擺脫它? 有什么辦法可以解決這個問題?

Question 2)

僅當沒有headless選項時才會發生這種情況, ie

const browser = await puppeteer.launch({
  headless: false
})

或者這是我們必須接受並繼續前進的事實?

嘗試使用此npm 包生成隨機用戶代理。 這通常可以解決基於用戶代理的保護。

page.setUserAgent頁面中可以使用page.setUserAgent覆蓋瀏覽器用戶代理

var userAgent = require('user-agents');
...
await page.setUserAgent(userAgent.toString())

此外,您可以添加這兩個額外的插件,

puppeteer-extra-plugin-recaptcha - 使用一行代碼自動解決 reCAPTCHA: page.solveRecaptchas()

注意: puppeteer-extra-plugin-recaptcha使用付費服務2captcha

puppeteer-extra-plugin-stealth - 應用各種規避技術使無頭傀儡的檢測更加困難。

以下是我為繞過驗證碼和類似阻止而正在做的事情的列表:

  • 啟用隱身模式(通過 puppeteer-extra-plugin-stealth)
  • 隨機化用戶代理或設置一個有效的(通過隨機用戶代理)
  • 隨機化視口大小
  • 跳過圖像/樣式/字體加載以獲得更好的性能
  • 通過“WebDriver 檢查”
  • 通過“Chrome 檢查”
  • 通過“通知檢查”
  • 通過“插件檢查”
  • 通過“語言檢查”

完整代碼的鏈接在這里

 const randomUseragent = require('random-useragent'); //Enable stealth mode const puppeteer = require('puppeteer-extra') const StealthPlugin = require('puppeteer-extra-plugin-stealth') puppeteer.use(StealthPlugin()) const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36'; async function createPage (browser,url) { //Randomize User agent or Set a valid one const userAgent = randomUseragent.getRandom(); const UA = userAgent || USER_AGENT; const page = await browser.newPage(); //Randomize viewport size await page.setViewport({ width: 1920 + Math.floor(Math.random() * 100), height: 3000 + Math.floor(Math.random() * 100), deviceScaleFactor: 1, hasTouch: false, isLandscape: false, isMobile: false, }); await page.setUserAgent(UA); await page.setJavaScriptEnabled(true); await page.setDefaultNavigationTimeout(0); //Skip images/styles/fonts loading for performance await page.setRequestInterception(true); page.on('request', (req) => { if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){ req.abort(); } else { req.continue(); } }); await page.evaluateOnNewDocument(() => { // Pass webdriver check Object.defineProperty(navigator, 'webdriver', { get: () => false, }); }); await page.evaluateOnNewDocument(() => { // Pass chrome check window.chrome = { runtime: {}, // etc. }; }); await page.evaluateOnNewDocument(() => { //Pass notifications check const originalQuery = window.navigator.permissions.query; return window.navigator.permissions.query = (parameters) => ( parameters.name === 'notifications' ? Promise.resolve({ state: Notification.permission }) : originalQuery(parameters) ); }); await page.evaluateOnNewDocument(() => { // Overwrite the `plugins` property to use a custom getter. Object.defineProperty(navigator, 'plugins', { // This just needs to have `length > 0` for the current test, // but we could mock the plugins too if necessary. get: () => [1, 2, 3, 4, 5], }); }); await page.evaluateOnNewDocument(() => { // Overwrite the `languages` property to use a custom getter. Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'], }); }); await page.goto(url, { waitUntil: 'networkidle2',timeout: 0 } ); return page; }

您是否嘗試過設置瀏覽器代理?

await page.setUserAgent('5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36');

經過幾次測試,幾個包幫助我避免了重新驗證:

//const puppeteer = require('puppeteer');
const puppeteerExtra = require('puppeteer-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
const randomUseragent = require('random-useragent');

class PuppeteerService {

    constructor() {
        this.browser = null;
        this.page = null;
        this.pageOptions = null;
        this.waitForFunction = null;
        this.isLinkCrawlTest = null;
    }

    async initiate(countsLimitsData, isLinkCrawlTest) {
        this.pageOptions = {
            waitUntil: 'networkidle2',
            timeout: countsLimitsData.millisecondsTimeoutSourceRequestCount
        };
        this.waitForFunction = 'document.querySelector("body")';
        puppeteerExtra.use(pluginStealth());
        //const browser = await puppeteerExtra.launch({ headless: false });
        this.browser = await puppeteerExtra.launch({ headless: false });
        this.page = await this.browser.newPage();
        await this.page.setRequestInterception(true);
        this.page.on('request', (request) => {
            if (['image', 'stylesheet', 'font', 'script'].indexOf(request.resourceType()) !== -1) {
                request.abort();
            } else {
                request.continue();
            }
        });
        this.isLinkCrawlTest = isLinkCrawlTest;
    }

    async crawl(link) {
        const userAgent = randomUseragent.getRandom();
        const crawlResults = { isValidPage: true, pageSource: null };
        try {
            await this.page.setUserAgent(userAgent);
            await this.page.goto(link, this.pageOptions);
            await this.page.waitForFunction(this.waitForFunction);
            crawlResults.pageSource = await this.page.content();
        }
        catch (error) {
            crawlResults.isValidPage = false;
        }
        if (this.isLinkCrawlTest) {
            this.close();
        }
        return crawlResults;
    }

    close() {
        if (!this.browser) {
            this.browser.close();
        }
    }
}

const puppeteerService = new PuppeteerService();
module.exports = puppeteerService;

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM