简体   繁体   中英

How can I scrape a website without getting detected and bypassing reCAPTCHA using selenium webdriver through Python?

I know the webscraping and I have taken the data from different website and I am using python language and selenium webdriver chrome. But I call a website it is open front page and then I click or go any other page then website restrict me and website know that I am using automated chrome.

This may be because the website uses reCAPTCHA v3, which "allows you to verify if an interaction is legitimate without any user interaction". This means that they can identify if you are not a human without asking you to check the famous "I'm not a robot" box. That box is used in the former version of reCAPTCHA, v2.

Read more about reCAPTCHA here: https://developers.google.com/recaptcha/docs/versions

I don't think it's possible to work around this with Selenium. And, as was already mentioned, web scraping is often illegal.

These days, websites can detect your program as a BOT pretty easily. Currently Google have 4(four) reCAPTCHA to choose and implement from when creating a new site.

  • reCAPTCHA v3
  • reCAPTCHA v2 ("I'm not a robot" Checkbox)
  • reCAPTCHA v2 (Invisible reCAPTCHA badge)
  • reCAPTCHA v2 (Android)

Solution

However there are some generic approaches to avoid getting detected while web-scraping:

Outro

See:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM