简体   繁体   中英

Not able to open the webpage through selenium python

I am new to selenium python and I am trying to scrape the data from a website. Below is the code, where I have taken all the necessary precautions to not get blocked.

from random import randrange
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#Function to generate random useragent.
def generate_user_agent():
    user_agents_file = open("user_agents.txt", "r")
    user_agents = user_agents_file.read().split("\n")
    i = randrange(len(user_agents))
    userAgent = user_agents[i]
    user_agents_file.close()
    return userAgent

#Function to generate random IP address.
def generate_ip_address():
    proxies_file = open("proxyscrape_premium_http_proxies.txt", "r")
    proxies = proxies_file.read().split("\n")
    i = randrange(len(proxies))
    proxy = proxies[i]
    proxies_file.close()
    return proxy

#Function to create and set chrome options.
def set_chrome_options():
    proxy = generate_ip_address()
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_argument("--incognito")
    options.add_argument(f'--proxy-server={proxy}')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    return options, proxy

#Function to create a webdriver object and set its properties.
def create_webdriver():
    options, proxy = set_chrome_options()
    userAgent = generate_user_agent()
    webdriver.DesiredCapabilities.CHROME['proxy'] = {
    "httpProxy": proxy,
    "ftpProxy": proxy,
    "sslProxy": proxy,
    "proxyType": "MANUAL",}
    webdriver.DesiredCapabilities.CHROME['acceptSslCerts']=True
    driver = webdriver.Chrome(options=options, executable_path=r'chromedriver.exe')
    driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
    driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": userAgent})
    return driver

url = 'http://www.doctolib.de/impfung-covid-19-corona/berlin'
driver = create_webdriver()
driver.get(url)

The webpage is not opened via selenium web driver(but can be opened normally). Below is the screenshot of how the browser is opened when I run the code.

Please let me know If I am missing something. Any help would be highly appreciated

PS: I am using the premium proxies for IP rotation.

Browser_output

I've had similar experience in the past where the website detects that selenium is being used, even after using several methods like IP rotation, User-Agent rotation or using proxies.

I would suggest you to use the undetected_chromedriver library.

pip install undetected-chromedriver

It's able to load the website without any problem. The code snippet is given below:-

import undetected_chromedriver.v2 as uc
driver = uc.Chrome()
with driver:
    driver.get('http://www.doctolib.de/impfung-covid-19-corona/berlin')

I was having similar issue with Firefox on Linux. I just deleted the log file which was quite big for text file (4.8 mb) created by geckodriver and everything started to work fine again

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM