简体   繁体   English

我们如何在 colab.research.google.com 中使用 Selenium Webdriver?

[英]How can we use Selenium Webdriver in colab.research.google.com?

I want to use Selenium Webdriver of Chrome in colab.research.google.com for fast processing.我想在 colab.research.google.com 中使用 Chrome 的 Selenium Webdriver 进行快速处理。 I was able to install Selenium using !pip install selenium but the webdriver of chrome needs a path to webdriverChrome.exe.我可以使用!pip install selenium ,但是 chrome 的 webdriver 需要 webdriverChrome.exe 的路径。 How am I suppose to use it?我应该如何使用它?

PS- colab.research.google.com is an online platform which provides GPU for fast computational problems related to deep learning. PS- colab.research.google.com 是一个在线平台,提供 GPU 解决与深度学习相关的快速计算问题。 Please refrain from solutions such as webdriver.Chrome(path).请避免使用 webdriver.Chrome(path) 等解决方案。

You can do it by installing the chromium webdriver and adjusting some options such that it does not crash in google colab:您可以通过安装 chromium webdriver 并调整一些选项来做到这一点,使其不会在 google colab 中崩溃:

!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',options=chrome_options)
wd.get("https://www.webite-url.com")

this one worked in colab这个在 colab 工作

!pip install selenium
!apt-get update 
!apt install chromium-chromedriver

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)

I made my own library to make it easy.我制作了自己的库以使其变得容易。

!pip install kora -q
from kora.selenium import wd
wd.get("https://www.website.com")

PS: I forget how I searched and experimented until it worked. PS:我忘记了我是如何搜索和试验的,直到它起作用。 But I first wrote and shared it in this gist in Dec 2018.但我在 2018 年 12 月首次在此要点中撰写并分享了它。

Don't have enough repu to comment.没有足够的声望来发表评论。 :( :(

However @Thomas answer still works in 06.10.2021, but with just one simple change since right of the bat you'll get DeprecationWarning: use options instead of chrome_options但是@Thomas 的回答在 06.10.2021 中仍然有效,但是从蝙蝠右侧开始只需进行一个简单的更改,您就会得到DeprecationWarning: use options instead of chrome_options

Working code below:下面的工作代码:

!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://stackoverflow.com/questions/51046454/how-can-we-use-selenium-webdriver-in-colab-research-google-com")
wd.title

to use selenium in GOOGLE COLAB do the next steps in the colab notebook在 GOOGLE COLAB 中使用 selenium 在 colab 笔记本中执行后续步骤

!pip install kora -q

HOW TO USE IT INSIDE COLAB :如何在 COLAB 中使用它:

from kora.selenium import wd
wd.get("enter any website here")

YOU CAN ALSO USE IT WITH Beautiful Soup您也可以将它与美丽的汤一起使用

import bs4 as soup
wd.get("enter any website here")
html = soup.BeautifulSoup(wd.page_source)

colab and selenium How can data be extracted from a whoscored.com? colab 和 selenium 如何从 whoscored.com 中提取数据?

#    https://www.whoscored.com

# install chromium, its driver, and selenium
!apt update
!apt install chromium-chromedriver
!pip install selenium
# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome(options=options)
wd.get("https://www.whoscored.com")
print(wd.page_source)  # results

You can can rid of using .exe file by using WebDriverManager so instead of this您可以通过使用 WebDriverManager 摆脱使用 .exe 文件,而不是这个

System.setProperty("webdriver.gecko.driver", "driverpath/.exe");
WebDriver driver = new FirefoxDriver();

you will be writing this你会写这个

WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();

All you need is add the dependecy to the POM file(Im assuming you using maven or some build tool) Please see my full answer about how to use this in this link Using WebdriverManager您需要的只是将依赖项添加到 POM 文件中(我假设您使用 maven 或某些构建工具)请在此链接Using WebdriverManager中查看我关于如何使用它的完整答案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM