简体   繁体   中英

Running Selenium with Headless Chrome Webdriver

So I'm trying some stuff out with selenium and I really want it to be quick.

So my thought is that running it with headless chrome would make my script faster.

First is that assumption correct, or does it not matter if i run my script with a headless driver?

Anyways I still want to get it to work to run headless, but I somehow can't, I tried different things and most suggested that it would work as said here in the October update

How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?

But when I try that, I get weird console output and it still doesn't seem to work.

Any tipps appreciated.

To run chrome-headless just add --headless via chrome_options.add_argument , ie:

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#chrome_options.add_argument("--disable-extensions")
#chrome_options.add_argument("--disable-gpu")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")
# chrome_options.headless = True # also works
driver = webdriver.Chrome(options=chrome_options)
start_url = "https://duckgo.com"
driver.get(start_url)
print(driver.page_source.encode("utf-8"))
# b'<!DOCTYPE html><html xmlns="http://www....
driver.quit()

So my thought is that running it with headless chrome would make my script faster.

Try using chrome options like --disable-extensions or --disable-gpu and benchmark it, but I wouldn't count with much improvement.


References: headless-chrome

Install & run containerized Chrome:

docker pull selenium/standalone-chrome
docker run --rm -d -p 4444:4444 --shm-size=2g selenium/standalone-chrome

Connect using webdriver.Remote :

driver = webdriver.Remote('http://localhost:4444/wd/hub', DesiredCapabilities.CHROME)
driver.set_window_size(1280, 1024)
driver.get('https://www.google.com')
from time import sleep

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
url = "https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver"
driver.get(url)

sleep(5)

h1 = driver.find_element_by_xpath("//h1[@itemprop='name']").text
print(h1)

Then I run script on our local machine

➜ python script.py
Running Selenium with Headless Chrome Webdriver

It is working and it is with headless Chrome.

If you are using Linux environment, may be you have to add --no-sandbox as well and also specific window size settings. The --no-sandbox flag is no needed on Windows if you set user container properly.

Use --disable-gpu only on Windows. Other platforms no longer require it. The --disable-gpu flag is a temporary work around for a few bugs.

//Headless chrome browser and configure
            WebDriverManager.chromedriver().setup();
            ChromeOptions chromeOptions = new ChromeOptions();
            chromeOptions.addArguments("--no-sandbox");
            chromeOptions.addArguments("--headless");
            chromeOptions.addArguments("disable-gpu");
//          chromeOptions.addArguments("window-size=1400,2100"); // Linux should be activate
            driver = new ChromeDriver(chromeOptions);

Once you have selenium and web driver installed. Below worked for me with headless Chrome on linux cluster :

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_experimental_option("prefs",{"download.default_directory":"/databricks/driver"})
driver = webdriver.Chrome(chrome_options=options)

Todo (tested on headless server Debian Linux 9.4):

  1. Do this:

     # install chrome curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list apt-get -y update apt-get -y install google-chrome-stable # install chrome driver wget https://chromedriver.storage.googleapis.com/77.0.3865.40/chromedriver_linux64.zip unzip chromedriver_linux64.zip mv chromedriver /usr/bin/chromedriver chown root:root /usr/bin/chromedriver chmod +x /usr/bin/chromedriver
  2. Install selenium:

     pip install selenium

    and run this Python code:

     from selenium import webdriver from selenium.webdriver.chrome.options import Options options = Options() options.add_argument("no-sandbox") options.add_argument("headless") options.add_argument("start-maximized") options.add_argument("window-size=1900,1080"); driver = webdriver.Chrome(chrome_options=options, executable_path="/usr/bin/chromedriver") driver.get("https://www.example.com") html = driver.page_source print(html)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=r"C:\Program 
Files\Google\Chrome\Application\chromedriver.exe", options=chrome_options)

This is ok for me.

The Chromium developers recently added a 2nd headless mode (in 2021). See https://bugs.chromium.org/p/chromium/issues/detail?id=706008#c38

The --headless=chrome flag will now allow you to get the full functionality of Chrome in the new headless mode, and you can even run extensions in it.

Usage:

options.add_argument("--headless=chrome")

If something works in regular Chrome, it should now work with the newer headless mode too.

As stated by the accepted answer:

options.add_argument("--headless")

These tips might help to speed things up especially for headless:

There are quite a few things you can do in headless that you cant do in non headless

Since you will be using Chrome Headless, I've found adding this reduces the CPU usage by about 20% for me (I found this to be a CPU and memory hog when looking at htop)

--disable-crash-reporter

This will only disable when you are running in headless This might speed things up for you!!!

My settings are currently as follows and I reduce the CPU (but only a marginal time saving) by about 20%:

options.add_argument("--no-sandbox");
options.add_argument("--disable-dev-shm-usage");
options.add_argument("--disable-renderer-backgrounding");
options.add_argument("--disable-background-timer-throttling");
options.add_argument("--disable-backgrounding-occluded-windows");
options.add_argument("--disable-client-side-phishing-detection");
options.add_argument("--disable-crash-reporter");
options.add_argument("--disable-oopr-debug-crash-dump");
options.add_argument("--no-crash-upload");
options.add_argument("--disable-gpu");
options.add_argument("--disable-extensions");
options.add_argument("--disable-low-res-tiling");
options.add_argument("--log-level=3");
options.add_argument("--silent");

I found this to be a pretty good list (full list I think) of command line switches with explanations: https://peter.sh/experiments/chromium-command-line-switches/

Some additional things you can turn off are also mentioned here: https://github.com/GoogleChrome/chrome-launcher/blob/main/docs/chrome-flags-for-tools.md

I hope this helps someone

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM