簡體   English   中英

如何在使用 puppeteer 進行網頁抓取時避免 hcaptcha 顯示圖像以解決驗證碼

[英]How to avoid hcaptcha showing images to solve captcha while using puppeteer for webscraping

我正在嘗試抓取一個網站。 但是,當我嘗試通過按下驗證碼復選標記來通過驗證碼時,它會為我提供解決驗證碼的圖像。 有時它會這樣做,有時它只是在解決驗證碼后通過並將我導航到頁面。

下面是我如何設置我的 puppeteer 實例和頁面的代碼。

  puppeteer.use(StealthPlugin());
  
  const chromeOptions = {
    headless: false,
    ignoreHTTPSErrors: true,
    slowMo: 30,
    args: ['--no-sandbox'],
  }

  const browser = await puppeteer.launch(chromeOptions);
  const page = await browser.newPage();
  await page.evaluateOnNewDocument(() => {
    delete navigator.__proto__.webdriver;
  });

  await page.setUserAgent(randomUseragent.getRandom());
  await page.setJavaScriptEnabled(true);
  //page.setDefaultNavigationTimeout(0);
  await page.goto(`pagetoscrape`, {
    waitUntil: "domcontentloaded",
 });

下面是我嘗試解決驗證碼的方法。

  await delay(6000);
  const iframes = await page.$('iframe');
  const frame = await iframes.contentFrame();
  const a = await frame.$('#checkbox');
  await a.click();
  await delay(5000);
  await page.screenshot({path: 'headless-test-result.png'});
  console.log("Solving captcha........");
  await page.waitForNavigation();
  await delay(7000);
  • 您要解決的驗證碼是 hCaptcha 的類型。 您可以使用以下庫來解決這個問題。

     https://www.npmjs.com/package/puppeteer-extra-plugin-recaptcha
  • 該庫使用https://2captcha.com/ ,您必須購買付費服務,然后您將獲得2CAPTCHA API KEY 此鍵在代碼中使用。


const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const stealth = StealthPlugin()
stealth.enabledEvasions.delete('user-agent-override')
require('dotenv').config();

const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')

puppeteer.use(
    RecaptchaPlugin({
        provider: {
            id: '2captcha',
            token: 'YOUR 2captcha API KEY', // REPLACE THIS WITH YOUR OWN 2CAPTCHA API KEY ⚡
        },
        visualFeedback: true
    })
)
puppeteer.use(stealth);

(async () => {
    // const browser = await puppeteer.launch({ headless: false })
    const browser = await puppeteer.launch({
        headless: false,
        args: ['--disable-web-security', '--disable-features=IsolateOrigins,site-per-process']
    })
    const page = await browser.newPage()

    await page.setExtraHTTPHeaders({
        'accept-language': 'en-US,en;q=0.9,hy;q=0.8'
    });
    const navigationPromise = page.waitForNavigation()

    // Enter app url in browser
    await page.goto('YOUR WEBSITE URL IN WHICH YOU WANT TO SOLVE hCaptcha')

    await navigationPromise

    await page.waitForSelector('.signup_menu_button')
    await page.click('.signup_menu_button')

    await page.waitForSelector('#signup_form_email')
    await page.click('#signup_form_email')
    await page.type('#signup_form_email', 'YOUR EMAIL')

    await page.waitForSelector('#signup_form_password')
    await page.click('#signup_form_password')
    await page.type('#signup_form_password', 'YOUR PASSWORD')

  

    try {
        const captha_response = await page.solveRecaptchas()
        console.log("captha_response: =========***********==========>", captha_response);

        await page.waitFor(500);

        //click on signup to close modal
        await page.evaluate(() => {
            const allDivs = document.querySelectorAll('#signup_button');
            const randomElement = allDivs[Math.floor(Math.random() * allDivs.length)];
            randomElement.click();
        });

        await page.waitForSelector('#signup_button');
        await page.click('#signup_button')

        await page.waitFor(500);
    } catch (err) {
        console.log("hcaptcha error==>", err);
    }
})()

注意:這個函數名是solveRecaptchas(),但它也是解決hCaptcha。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM