简体繁体中英

How can I scrape a website without getting detected and bypassing reCAPTCHA using selenium webdriver through Python?

原文 2019-03-13 09:43:58 7 2 python/ selenium-webdriver/ web-scraping/ recaptcha/ webdriver-w3c-spec

I know the webscraping and I have taken the data from different website and I am using python language and selenium webdriver chrome. But I call a website it is open front page and then I click or go any other page then website restrict me and website know that I am using automated chrome.

2 answers

This may be because the website uses reCAPTCHA v3, which "allows you to verify if an interaction is legitimate without any user interaction". This means that they can identify if you are not a human without asking you to check the famous "I'm not a robot" box. That box is used in the former version of reCAPTCHA, v2.

Read more about reCAPTCHA here: https://developers.google.com/recaptcha/docs/versions

I don't think it's possible to work around this with Selenium. And, as was already mentioned, web scraping is often illegal.

These days, websites can detect your program as a BOT pretty easily. Currently Google have 4(four) reCAPTCHA to choose and implement from when creating a new site.

reCAPTCHA v3
reCAPTCHA v2 ("I'm not a robot" Checkbox)
reCAPTCHA v2 (Invisible reCAPTCHA badge)
reCAPTCHA v2 (Android)

Solution

However there are some generic approaches to avoid getting detected while web-scraping:

The first and foremost attribute a website can determine your script/program is through your monitor size . So it is recommended not to use the conventional Viewport .
If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs) . Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds

Outro

See:

How can I scrape a website that does not show any HTML codes in the source using Python without Selenium

How do I scrape ::before element in a website using selenium python

Python - Selenium Webdriver and reCAPTCHA

How can I select the “Sort By” element through Selenium Webdriver Python

How can I scrape hidden elements without using selenium?

How can I scrape WHO influenza data without using Selenium?

How to scrape all the search results using Selenium webdriver and Python

How do I scrape multiple URLs using Selenium WebDriver?

Scrape website data without BS or selenium (Python)

How can I open async several selenium webdriver using python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How can I scrape a website that does not show any HTML codes in the source using Python without Selenium How do I scrape ::before element in a website using selenium python Python - Selenium Webdriver and reCAPTCHA How can I select the “Sort By” element through Selenium Webdriver Python How can I scrape hidden elements without using selenium? How can I scrape WHO influenza data without using Selenium? How to scrape all the search results using Selenium webdriver and Python How do I scrape multiple URLs using Selenium WebDriver? Scrape website data without BS or selenium (Python) How can I open async several selenium webdriver using python

Related Tags

How can I scrape a website without getting detected and bypassing reCAPTCHA using selenium webdriver through Python?

Question

2 answers

solution1
1 2019-03-13 10:00:30

solution2
0 2019-03-13 15:18:04

Solution

Outro

How can I scrape a website without getting detected and bypassing reCAPTCHA using selenium webdriver through Python?

Question

2 answers

solution1 1 2019-03-13 10:00:30

solution2 0 2019-03-13 15:18:04

Solution

Outro

solution1
1 2019-03-13 10:00:30

solution2
0 2019-03-13 15:18:04