简体   繁体   English

带Selenium的Firefox(无头)

[英]Firefox with Selenium (Headless)

How to use selenium with firefox to scrape websites? 如何使用seleniumfirefox刮网站?

Install Firefox, xvfb, selenium 安装Firefox,xvfb,selenium

echo "deb http://packages.linuxmint.com debian import" >> /etc/apt/sources.list && apt-get update
apt-get install firefox xvfb python-dev python-pip
pip install pyvirtualdisplay selenium

selenium_scrape.py selenium_scrape.py

from pyvirtualdisplay import Display
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

display = Display(visible=0, size=(800, 600))
display.start()

def init_driver():
    driver = webdriver.Firefox()
    driver.wait = WebDriverWait(driver, 5)
    return driver

def lookup(driver, query):
    driver.get("http://www.google.com")
    try:
    box = driver.wait.until(EC.presence_of_element_located(
        (By.NAME, "q")))
    button = driver.wait.until(EC.element_to_be_clickable(
        (By.NAME, "btnK")))
    box.send_keys(query)
button.click()
    except TimeoutException:
        print("Box or Button not found in google.com")

if __name__ == "__main__":
    driver = init_driver()
    lookup(driver, "Selenium")
    time.sleep(5)
    driver.quit()

display.stop()

Error 错误

  File "selenium_scrape.py", line 20
    box = driver.wait.until(EC.presence_of_element_located(
      ^
IndentationError: expected an indented block

The difference is that you cannot use a packaged Chrome browser; 不同之处在于您无法使用打包的Chrome浏览器; you need a special driver... chromedriver. 你需要一个特殊的司机...... chromedriver。

Get the current latest version here: Chromedriver 在这里获取最新版本: Chromedriver

Now you have 2 options, either to move the downloaded chromedriver so it is always accessible (option 1), or to define in your script how to access it. 现在您有两个选项,要么移动下载的chromedriver,以便始终可以访问它(选项1),要么在脚本中定义如何访问它。

Option 1: move it into path 选项1:将其移至路径中

Then move it so it is accessible when you use webdriver.Chrome() : 然后移动它,以便在使用webdriver.Chrome()时可以访问它:

sudo mv /path/to/download/chromedriver /usr/bin

Also set it to be allowed to be executed: 同时将其设置为允许执行:

chmod a+x /usr/binchromedriver

Option 2: do not move it into path 选项2:不要将其移入路径

Or you can define a path 或者您可以定义路径

import os
chr = "/Users/you/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chr
driver = webdriver.Chrome(chromedriver)

(Note: the original question was about Chrome, so my answer is about Chrome, not Firefox). (注意:最初的问题是关于Chrome的,所以我的答案是关于Chrome,而不是Firefox)。

For me it works if I just extract the chromedriver into the same folder where the script is. 对我而言,如果我只是将chromedriver提取到脚本所在的同一文件夹中,它就可以工作。

Then I run it as this 然后我就这样运行它

Xvfb :99 -ac -screen 0 1280x1024x16 &
echo 'Starting the test'
PATH=$PATH:. python selenimum_scrape.py

This will start the Xvfb and include the cromedriver into PATH . 这将启动Xvfb并将cromedriver包含在PATH

And the modified version of your which works for me: 你的修改版本对我有用:

import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

# comment this out to run on the real display
os.environ['DISPLAY'] = ':99'

def init_driver():
    driver = webdriver.Chrome()
    driver.wait = WebDriverWait(driver, 5)
    return driver

def lookup(driver, query):
    driver.get("http://www.google.com")
    try:
        box = driver.wait.until(EC.presence_of_element_located(
            (By.NAME, "q")))
        # once we type the query, this button disappears
        # button = driver.wait.until(EC.element_to_be_clickable(
        #     (By.NAME, "btnK")))
        box.send_keys(query)
        button = driver.wait.until(EC.element_to_be_clickable(
            (By.NAME, "btnG")))
        button.click()
    except TimeoutException:
        print("Box or Button not found in google.com")

if __name__ == "__main__":
    driver = init_driver()
    lookup(driver, "Selenium")
    time.sleep(5)
    driver.quit()

The question is (at the moment) about an indentation error. 问题是(目前)有关缩进错误的问题。 This can be easily fixed: 这很容易修复:

def lookup(driver, query):
    driver.get("http://www.google.com")
    try:
        box = driver.wait.until(EC.presence_of_element_located(
            (By.NAME, "q")))
        button = driver.wait.until(EC.element_to_be_clickable(
            (By.NAME, "btnK")))
        box.send_keys(query)
        button.click()
    except TimeoutException:
        print("Box or Button not found in google.com")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM