简体   繁体   English

Python Selenium Chromedriver 无法使用 --headless 选项

[英]Python Selenium Chromedriver not working with --headless option

I am running chromedriver to try and scrape some data off of a website.我正在运行 chromedriver 以尝试从网站上抓取一些数据。 Everything works fine without the headless option.没有无头选项,一切正常。 However, when I add the option the webdriver takes a very long time to load the url, and when I try to find an element (that is found when run without --headless), I receive an error.但是,当我添加该选项时,webdriver 需要很长时间才能加载 url,并且当我尝试查找一个元素(在没有 --headless 的情况下运行时找到)时,我收到错误消息。

Using print statements and getting the html after the url "loaded", I find that there is no html, it's empty (See in output below).使用 print 语句并在 url“加载”后获取 html,我发现没有 html,它是空的(见下面的 output)。

class Fidelity:
    def __init__(self):
        self.url = 'https://eresearch.fidelity.com/eresearch/gotoBL/fidelityTopOrders.jhtml'
        self.options = Options()
        self.options.add_argument("--headless")
        self.options.add_argument("--window-size=1500,1000")
        self.driver = webdriver.Chrome(executable_path='.\\dependencies\\chromedriver.exe', options = self.options)
        print("init")

    def initiate_browser(self):
        self.driver.get(self.url)
        time.sleep(5)
        script = self.driver.execute_script("return document.documentElement.outerHTML")
        print(script)
        print("got url")

    def find_orders(self):
        wait = WebDriverWait(self.driver, 15)
        data= wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]'))) #ERROR ON THIS LINE

This is the entire output:这是整个 output:

init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
  File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 102, in <module>
    orders = scrape.find_tesla_orders()
  File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 75, in find_tesla_orders
    tesla = self.driver.find_element_by_xpath("//a[@href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']")
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@href='https://qr.fidelity.com/embeddedquotes/redirect/research?symbol=TSLA']"}
  (Session info: headless chrome=74.0.3729.169)
  (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Windows NT 10.0.17763 x86_64)

New error with updated code:更新代码的新错误:

init
<html><head></head><body></body></html>
url
Traceback (most recent call last):
  File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 104, in <module>
    orders = scrape.find_tesla_orders()
  File "C:\Users\Zachary\Documents\Python\Tesla Stock Info\Scraper.py", line 76, in find_tesla_orders
    tesla = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

I have tried finding the answer to this through google but none of the suggestions work.我试过通过谷歌找到这个问题的答案,但没有一个建议有效。 Is anyone else having this issue with certain websites?是否有其他人在某些网站上遇到此问题? Any help appreciated.任何帮助表示赞赏。

Update更新

This script still does not work unfortunately, the webdriver is not loading the page correctly for some reason while headless, even though everything works perfectly without running this using the headless option.不幸的是,这个脚本仍然无法运行,webdriver 在无头时由于某种原因没有正确加载页面,即使在没有使用无头选项运行它的情况下一切正常。

For anyone in the future who is wondering the fix to this, some websites just don't load correctly with the headless option of chrome.对于将来想解决此问题的任何人来说,某些网站只是无法使用 chrome 的无头选项正确加载。 I don't think there is a way to fix this.我认为没有办法解决这个问题。 Just use a different browser (like firefox).只需使用不同的浏览器(如 Firefox)。 Thanks to user8426627 for this.感谢 user8426627 为此。

Have you tried using a User-Agent?您是否尝试过使用用户代理?

I was experiencing the same error.我遇到了同样的错误。 First what I did was to download the HTML source page for both headless and normal with:首先我做的是下载无头和普通的 HTML 源页面:

html = driver.page_source
file = open("foo.html","w")
file.write(html)
file.close()

The HTML source code for the headless mode was a short file with this line nearly at the end: The page cannot be displayed. Please contact the administrator for additional information.无头模式的 HTML 源代码是一个简短的文件,几乎在末尾有一行: The page cannot be displayed. Please contact the administrator for additional information. The page cannot be displayed. Please contact the administrator for additional information. But the normal mode was the expected HTML.但正常模式是预期的 HTML。

I solve the issue by adding an User-Agent:我通过添加用户代理解决了这个问题:

from fake_useragent import UserAgent
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(executable_path = f"your_path",chrome_options=chrome_options)

Add explicit wait.添加显式等待。 You should also use another locator, the current one match 3 elements.您还应该使用另一个定位器,当前定位器匹配 3 个元素。 The element has unique id attribute该元素具有唯一的 id 属性

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.common.by import By

wait = WebDriverWait(self.driver, timeout)
data = wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, '[id*="t_trigger_TSLA"]')))

I need to run the script from the same console without leaving the google browser however the browser still runs with my program 我需要在不离开Goog​​le浏览器的情况下从同一控制台运行脚本,但是浏览器仍与我的程序一起运行

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("window-size=1920,1080")
print("complete")

driver = webdriver.Chrome('C:\proyectos\python-selenium\driver\chromedriver.exe')
driver.get('https://www.facebook.com/')

Try setting the window size as well as being headless.尝试设置 window 大小以及无头。 Add this:添加这个:

chromeOptions.add_argument("--window-size=1920,1080")

The default size of the headless browser is tiny.无头浏览器的默认大小很小。 If the code works when headless is not enabled it might be because your object is outside the window.如果代码在未启用无头时有效,则可能是因为您的 object 位于 window 之外。

some websites just don't load correctly with the headless option of chrome.有些网站无法使用 Chrome 的无头选项正确加载。

The previous statement is actually wrong.前面的说法其实是错误的。 I just got into this problem where Chrome wasn't detecting the elements.我刚遇到这个问题,Chrome 没有检测到这些元素。 When I saw the @LuckyZakary answer I was shocked because someone created a scrapping for the same website with nodeJs and didn't got this error.当我看到@LuckyZakary 的回答时,我感到很震惊,因为有人用nodeJs为同一个网站创建了一个报废,但没有收到这个错误。

@AtulGumar answer helped on Windows but on Ubuntu server it failed. @AtulGumar 回答对 Windows 有帮助,但在 Ubuntu 服务器上它失败了。 So it wasn't enough.所以这还不够。 After reading this , all to the bottom, what @AtulGumar missed was to add the –disable-gpu flag.读完这篇文章后,归根结底,@AtulGumar 错过的是添加–disable-gpu标志。

So it work for me on Windows and Ubuntu server with no GUI with those options:所以它在 Windows 和 Ubuntu 服务器上对我有用,没有带有这些选项的 GUI:

webOptions = webdriver.ChromeOptions()
webOptions.headless = True
webOptions.add_argument("--window-size=1920,1080")
webOptions.add_argument("–disable-gpu")
driver = webdriver.Chrome(options=webOptions)

I also installed xvfb and other packages as suggested here :我还按照此处的建议安装了xvfb和其他软件包:

sudo apt-get -y install xorg xvfb gtk2-engines-pixbuf
sudo apt-get -y install dbus-x11 xfonts-base xfonts-100dpi xfonts-75dpi xfonts-cyrillic xfonts-scalable

and executed:并执行:

Xvfb -ac :99 -screen 0 1280x1024x16 &
export DISPLAY=:99

strong texttry to add executable path into Service object强文本尝试将可执行路径添加到服务 object

options =  Options()
options.add_argument('---incognito')
options.add_argument('---disable-extension')
options.add_argument("--no-sandbox")
options.add_argument('-–disable-gpu')
options.add_argument('--headless')
service = Service (executable_path=ChromeDriverManager().install() )
return webdriver.Chrome(service=service  , options=options)

its work for me:)它对我有用:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 3 硒 | 剪贴板不适用于 Windows 上的无头 chromedriver - Python 3 Selenium | Clipboard not working on headless chromedriver on Windows Chromedriver无法在python Selenium上运行 - Chromedriver not working on python selenium Selenium arsparse 用于无头 chromedriver - Selenium arsparse for headless chromedriver python selenium headless chromedriver在前一天工作时没有加载整页,代码没有改变 - python selenium headless chromedriver not loading full page when it was working the day before with no changes to the code Python Selenium ChromeDriver 代理不工作 - Python Selenium ChromeDriver proxy not working Chromedriver可在路径中使用,但不能与硒一起使用(Python) - Chromedriver working in path but not with selenium (python) Selenium ChromeDriver无法识别新编译的Headless Chromium(Python) - Selenium ChromeDriver does not recognize newly compiled Headless Chromium (Python) Headless Python Selenium 显示错误 'chromedriver' 可执行文件需要在 PATH 中 - Headless Python Selenium Shows Error 'chromedriver' executable needs to be in PATH 使用无头选项添加扩展 selenium python - add extension with option headless selenium python Selenium 仅在 Headless 中运行 ChromeDriver - Selenium ONLY Runs ChromeDriver in Headless
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM