[英]Issue running Selenium on AWS Lambda
I am currently trying to implement a scraper that will check twice a day for if certain PDFs change names.我目前正在尝试实施一个抓取工具,每天检查两次是否某些 PDF 更改了名称。 Unfortunately it requires website manipulation to find the pdfs so the best solution in my mind is a combination of Selenium and AWS Lambda.
不幸的是,它需要网站操作才能找到 pdf,所以我认为最好的解决方案是 Selenium 和 AWS Lambda 的组合。
To begin I was following this tutorial.一开始,我正在学习本教程。 I have completed the tutorial but ran into this error from Lambda:
我已经完成了本教程,但在 Lambda 中遇到了这个错误:
START RequestId: 18637c6d-ea75-40ee-8789-374654700b99 Version: $LATEST
Starting google.com
Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
: WebDriverException
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 46, in lambda_handler
driver = webdriver.Chrome(chrome_options=chrome_options)
File "/var/task/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
self.service.start()
File "/var/task/selenium/webdriver/common/service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
This error was experienced by others and was "resolved" by the author by linking to this stack overflow page.这个错误是其他人遇到的,作者通过链接到这个堆栈溢出页面来“解决”。 I have tried going through it but all the answers are pertaining to using headless chromium on desktop not AWS lambda.
我已经尝试过它,但所有的答案都与在桌面上使用无头铬有关,而不是 AWS lambda。
A couple of changes Ive tried to no avail.我尝试了一些更改,但无济于事。
1) Changing the chromedriver and headless-chromium to .exe files 1) 将 chromedriver 和 headless-chromium 更改为 .exe 文件
2) Changing this line of code to include the executable_path 2) 更改这行代码以包含 executable_path
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=os.getcwd() + "/bin/chromedriver.exe")
Any help in getting selenium and aws lambda working together would be greatly appreciated.任何让 selenium 和 aws lambda 协同工作的帮助将不胜感激。
I had the same issue and it was due to the binary files being in a location that couldn't execute them.我遇到了同样的问题,这是由于二进制文件位于无法执行它们的位置。 Adding a function to move them, then reading them from that location fixed it.
添加一个函数来移动它们,然后从那个位置读取它们修复它。 See below example which I just got working while researching this error.
请参阅下面的示例,我在研究此错误时刚刚开始工作。 (Apologies for the messy code.)
(为凌乱的代码道歉。)
import time
import os
from selenium import webdriver
from fake_useragent import UserAgent
import subprocess
import shutil
import time
BIN_DIR = "/tmp/bin"
CURR_BIN_DIR = os.getcwd() + "/bin"
def _init_bin(executable_name):
start = time.clock()
if not os.path.exists(BIN_DIR):
print("Creating bin folder")
os.makedirs(BIN_DIR)
print("Copying binaries for " + executable_name + " in /tmp/bin")
currfile = os.path.join(CURR_BIN_DIR, executable_name)
newfile = os.path.join(BIN_DIR, executable_name)
shutil.copy2(currfile, newfile)
print("Giving new binaries permissions for lambda")
os.chmod(newfile, 0o775)
elapsed = time.clock() - start
print(executable_name + " ready in " + str(elapsed) + "s.")
def handler(event, context):
_init_bin("headless-chromium")
_init_bin("chromedriver")
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.binary_location = "/tmp/bin/headless-chromium"
driver = webdriver.Chrome("/tmp/bin/chromedriver", chrome_options=chrome_options)
driver.get('https://en.wikipedia.org/wiki/Special:Random')
line = driver.find_element_by_class_name('firstHeading').text
print(line)
driver.quit()
return line
I also had the same issue but I have fixed it now.我也有同样的问题,但我现在已经解决了。 In my case it was the python version was not same on lambda and My Dockerfile.
就我而言,它是 lambda 和 My Dockerfile 上的 python 版本不同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.