[英]How to avoid hcaptcha showing images to solve captcha while using puppeteer for webscraping
我正在嘗試抓取一個網站。 但是,當我嘗試通過按下驗證碼復選標記來通過驗證碼時,它會為我提供解決驗證碼的圖像。 有時它會這樣做,有時它只是在解決驗證碼后通過並將我導航到頁面。
下面是我如何設置我的 puppeteer 實例和頁面的代碼。
puppeteer.use(StealthPlugin());
const chromeOptions = {
headless: false,
ignoreHTTPSErrors: true,
slowMo: 30,
args: ['--no-sandbox'],
}
const browser = await puppeteer.launch(chromeOptions);
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
});
await page.setUserAgent(randomUseragent.getRandom());
await page.setJavaScriptEnabled(true);
//page.setDefaultNavigationTimeout(0);
await page.goto(`pagetoscrape`, {
waitUntil: "domcontentloaded",
});
下面是我嘗試解決驗證碼的方法。
await delay(6000);
const iframes = await page.$('iframe');
const frame = await iframes.contentFrame();
const a = await frame.$('#checkbox');
await a.click();
await delay(5000);
await page.screenshot({path: 'headless-test-result.png'});
console.log("Solving captcha........");
await page.waitForNavigation();
await delay(7000);
您要解決的驗證碼是 hCaptcha 的類型。 您可以使用以下庫來解決這個問題。
https://www.npmjs.com/package/puppeteer-extra-plugin-recaptcha
該庫使用https://2captcha.com/ ,您必須購買付費服務,然后您將獲得2CAPTCHA API KEY 。 此鍵在代碼中使用。
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const stealth = StealthPlugin()
stealth.enabledEvasions.delete('user-agent-override')
require('dotenv').config();
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
puppeteer.use(
RecaptchaPlugin({
provider: {
id: '2captcha',
token: 'YOUR 2captcha API KEY', // REPLACE THIS WITH YOUR OWN 2CAPTCHA API KEY ⚡
},
visualFeedback: true
})
)
puppeteer.use(stealth);
(async () => {
// const browser = await puppeteer.launch({ headless: false })
const browser = await puppeteer.launch({
headless: false,
args: ['--disable-web-security', '--disable-features=IsolateOrigins,site-per-process']
})
const page = await browser.newPage()
await page.setExtraHTTPHeaders({
'accept-language': 'en-US,en;q=0.9,hy;q=0.8'
});
const navigationPromise = page.waitForNavigation()
// Enter app url in browser
await page.goto('YOUR WEBSITE URL IN WHICH YOU WANT TO SOLVE hCaptcha')
await navigationPromise
await page.waitForSelector('.signup_menu_button')
await page.click('.signup_menu_button')
await page.waitForSelector('#signup_form_email')
await page.click('#signup_form_email')
await page.type('#signup_form_email', 'YOUR EMAIL')
await page.waitForSelector('#signup_form_password')
await page.click('#signup_form_password')
await page.type('#signup_form_password', 'YOUR PASSWORD')
try {
const captha_response = await page.solveRecaptchas()
console.log("captha_response: =========***********==========>", captha_response);
await page.waitFor(500);
//click on signup to close modal
await page.evaluate(() => {
const allDivs = document.querySelectorAll('#signup_button');
const randomElement = allDivs[Math.floor(Math.random() * allDivs.length)];
randomElement.click();
});
await page.waitForSelector('#signup_button');
await page.click('#signup_button')
await page.waitFor(500);
} catch (err) {
console.log("hcaptcha error==>", err);
}
})()
注意:這個函數名是solveRecaptchas(),但它也是解決hCaptcha。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.