[英]Selenium works on AWS EC2 but not on AWS Lambda
I've looked at and tried nearly every other post on this topic with no luck.我已经查看并尝试了几乎所有关于此主题的其他帖子,但都没有成功。
I'm using python 3.6
so I'm using the following AMI amzn-ami-hvm-2018.03.0.20181129-x86_64-gp2
(see here ).我使用的是
python 3.6
所以我使用了以下 AMI amzn-ami-hvm-2018.03.0.20181129-x86_64-gp2
(请参阅此处)。 Once I SSH into my EC2, I download Chrome with:通过 SSH 连接到我的 EC2 后,我使用以下命令下载 Chrome:
sudo curl https://intoli.com/install-google-chrome.sh | bash
cp -r /opt/google/chrome/ /home/ec2-user/
google-chrome-stable --version
# Google Chrome 86.0.4240.198
And download and unzip the matching Chromedriver:并下载并解压缩匹配的 Chromedriver:
sudo wget https://chromedriver.storage.googleapis.com/86.0.4240.22/chromedriver_linux64.zip
sudo unzip chromedriver_linux64.zip
I install python36
and selenium
with:我使用以下命令安装
python36
和selenium
:
sudo yum install python36 -y
sudo /usr/bin/pip-3.6 install selenium
Then run the script:然后运行脚本:
import os
import selenium
from selenium import webdriver
CURR_PATH = os.getcwd()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--remote-debugging-port=9222')
chrome_options.binary_location = f"{CURR_PATH}/chrome/google-chrome"
driver = webdriver.Chrome(
executable_path = f"{CURR_PATH}/chromedriver",
chrome_options=chrome_options
)
driver.get("https://www.google.com/")
html = driver.page_source
print(html)
This works这有效
I then zip my chromedriver and Chrome files:然后我压缩我的 chromedriver 和 Chrome 文件:
mkdir tmp
mv chromedriver tmp
mv chrome tmp
cd tmp
zip -r9 ../chrome.zip chromedriver chrome
And copy the zipped file to an S3
bucket并将压缩文件复制到
S3
存储桶
This is my lambda function:这是我的 lambda 函数:
import os
import boto3
from botocore.exceptions import ClientError
import zipfile
import selenium
from selenium import webdriver
s3 = boto3.resource('s3')
def handler(event, context):
chrome_bucket = os.environ.get('CHROME_S3_BUCKET')
chrome_key = os.environ.get('CHROME_S3_KEY')
# DOWNLOAD HEADLESS CHROME FROM S3
try:
# with open('/tmp/headless_chrome.zip', 'wb') as data:
s3.meta.client.download_file(chrome_bucket, chrome_key, '/tmp/chrome.zip')
print(os.listdir('/tmp'))
except ClientError as e:
raise e
# UNZIP HEADLESS CHROME
try:
with zipfile.ZipFile('/tmp/chrome.zip', 'r') as zip_ref:
zip_ref.extractall('/tmp')
# FREE UP SPACE
os.remove('/tmp/chrome.zip')
print(os.listdir('/tmp'))
except:
raise ValueError('Problem with unzipping Chrome executable')
# CHANGE PERMISSION OF CHROME
try:
os.chmod('/tmp/chromedriver', 0o775)
os.chmod('/tmp/chrome/chrome', 0o775)
os.chmod('/tmp/chrome/google-chrome', 0o775)
except:
raise ValueError('Problem with changing permissions to Chrome executable')
# GET LINKS
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--remote-debugging-port=9222')
chrome_options.binary_location = "/tmp/chrome/google-chrome"
driver = webdriver.Chrome(
executable_path = "/tmp/chromedriver",
chrome_options=chrome_options
)
driver.get("https://www.google.com/")
html = driver.page_source
print(html)
I'm able to see my unzipped files in the /tmp
path.我可以在
/tmp
路径中看到我解压后的文件。
And my error:而我的错误:
{
"errorMessage": "Message: unknown error: unable to discover open pages\n",
"errorType": "WebDriverException",
"stackTrace": [
[
"/var/task/lib/observer.py",
69,
"handler",
"chrome_options=chrome_options"
],
[
"/var/task/selenium/webdriver/chrome/webdriver.py",
81,
"__init__",
"desired_capabilities=desired_capabilities)"
],
[
"/var/task/selenium/webdriver/remote/webdriver.py",
157,
"__init__",
"self.start_session(capabilities, browser_profile)"
],
[
"/var/task/selenium/webdriver/remote/webdriver.py",
252,
"start_session",
"response = self.execute(Command.NEW_SESSION, parameters)"
],
[
"/var/task/selenium/webdriver/remote/webdriver.py",
321,
"execute",
"self.error_handler.check_response(response)"
],
[
"/var/task/selenium/webdriver/remote/errorhandler.py",
242,
"check_response",
"raise exception_class(message, screen, stacktrace)"
]
]
}
EDIT: I am willing to try out anything at this point.编辑:此时我愿意尝试任何事情。 Different versions of Chrome or Chromium, Chromedriver, Python or Selenium.
不同版本的 Chrome 或 Chromium、Chromedriver、Python 或 Selenium。
EDIT2: The answer below did not solve the problem. EDIT2:下面的答案没有解决问题。
This error message...这个错误信息...
"errorMessage": "Message: unknown error: unable to discover open pages\n",
"errorType": "WebDriverException"
...implies that the ChromeDriver was unable to initiate/spawn a new Browsing Context ie Chrome Browser session. ...暗示ChromeDriver无法启动/生成新的浏览上下文,即Chrome 浏览器会话。
It seems the issue is with ChromeDriver ,s security feature of Sandboxing .问题似乎出在ChromeDriver 沙盒的安全功能上。
A common cause for Chrome to crash during startup is running Chrome as
root
user (administrator
) on Linux.Chrome 在启动期间崩溃的一个常见原因是在 Linux 上以
root
用户(administrator
)身份运行 Chrome。 While it is possible to work around this issue by passing--no-sandbox
flag when creating your WebDriver session, such a configuration is unsupported and highly discouraged.虽然可以通过在创建 WebDriver 会话时传递
--no-sandbox
标志来解决此问题,但这种配置不受支持且非常不鼓励。 You need to configure your environment to run Chrome as a regular user instead.您需要将环境配置为以普通用户身份运行 Chrome。
A bit of more details about your usecase would have helped us to analyze the usage of the arguments which you have used and the root cause of the error in a better way.有关您的用例的更多详细信息将帮助我们以更好的方式分析您使用的参数的用法以及错误的根本原因。 However, a few thoughts:
不过,有几点想法:
So you may need to drop the --no-sandbox
option.因此,您可能需要删除
--no-sandbox
选项。 Here is the link to the Sandbox story.这是沙盒故事的链接。
Some more considerations:还有一些考虑:
--headless
option you won't be able to use --window-size=1280x1696
due to certain constraints.--headless
选项时,由于某些限制,您将无法使用--window-size=1280x1696
。You can find a couple of relevant detailed discussion in:
您可以在以下位置找到一些相关的详细讨论:
--disable-gpu
was to enable google-chrome-headless on windows platform.--disable-gpu
是在Windows平台上启用google-chrome-headless 。 It was needed as SwiftShader fails an assert on Windows in headless mode earlier.You can find a relevant detailed discussion in ERROR:gpu_process_transport_factory.cc(1007)-Lost UI shared context : while initializing Chrome browser through ChromeDriver in Headless mode
您可以在ERROR:gpu_process_transport_factory.cc(1007)-Lost UI shared context 中找到相关的详细讨论:在 Headless 模式下通过 ChromeDriver 初始化 Chrome 浏览器时
--disable-dev-shm-usage
, --hide-scrollbars
, --enable-logging
, --log-level=0
, --v=99
, --single-process
and --remote-debugging-port=9222
arguments which you opt to drop for the time being and add them back as per your Test Specification .--disable-dev-shm-usage
、 --hide-scrollbars
、 --enable-logging
、 --log-level=0
、 --v=99
、 --single-process
任何具体要求--single-process
和--remote-debugging-port=9222
参数,您暂时选择删除它们,然后根据您的测试规范将它们添加回来。You can find a couple of relevant detailed discussion in:您可以在以下位置找到一些相关的详细讨论:
I was finally able to get it to work我终于能够让它工作
Python 3.7
selenium==3.14.0
headless-chromium v1.0.0-55
chromedriver 2.43
https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-chromium-amazonlinux-2017-03.zip https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-chromium-amazonlinux-2017-03.zip
https://chromedriver.storage.googleapis.com/2.43/chromedriver_linux64.zip https://chromedriver.storage.googleapis.com/2.43/chromedriver_linux64.zip
I added headless-chromium and chromedriver to a Lambda Layer
我在
Lambda Layer
添加了 headless-chromium 和 chromedriver
Permissions 755
for both works两部作品的权限
755
The Lambda function looks like this Lambda 函数如下所示
import os
import selenium
from selenium import webdriver
def handler(event, context):
print(os.listdir('/opt'))
#
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.binary_location = f"/opt/headless-chromium"
driver = webdriver.Chrome(
executable_path = f"/opt/chromedriver",
chrome_options=chrome_options
)
driver.get("https://www.google.com/")
html = driver.page_source
driver.close()
driver.quit()
print(html)
Hope this helps someone in Q4 2020 and after.希望这对 2020 年第四季度及之后的人有所帮助。
The answer of @CPak worked for me, I only had to copy the headless-chromium
and chromedriver
to /tmp
and grant permissions, the rest of the code is the same: @CPak 的答案对我有用,我只需要将
headless-chromium
和chromedriver
到/tmp
并授予权限,其余代码相同:
from shutil import copyfile
def permissions(origin_path, destiny_path):
copyfile(origin_path, destiny_path)
os.chmod(destiny_path, 0o775)
def lambda_handler(event, context):
permissions('/opt/chromedriver','/tmp/chromedriver')
permissions('/opt/headless-chromium','/tmp/headless-chromium')
I'm a big fan of this answer because a few months ago allows me to properly run a serverless scraper on AWS Lambda.我是这个答案的忠实粉丝,因为几个月前允许我在 AWS Lambda 上正确运行无服务器抓取工具。 But a few days ago this implementation began to fail, and traveling for hours and hours of searching I got to the conclusion that the binaries given here by @CPak (for chrome version 69) are too old to run on "modern" websites.
但是几天前,这个实现开始失败,经过数小时的搜索,我得出的结论是@CPak 在这里给出的二进制文件(对于 chrome 版本 69)太旧了,无法在“现代”网站上运行。
I found in this GitHub repo a file called chromium.zip , which is the headless-chromium binary for version 86.0.4240.0.我在这个GitHub 存储库中找到了一个名为chromium.zip的文件,它是版本 86.0.4240.0 的无头铬二进制文件。 And here I downloaded the matching chromedriver.
在这里,我下载了匹配的 chromedriver。 With these two files replacing the @Cpak answer or mine given previously the implementation should work.
用这两个文件替换之前给出的@Cpak 答案或我的答案,实现应该可以工作。
I'm still trying to find where to obtain the most recent versions of the headless chromium binaries when these versions stopped working.当这些版本停止工作时,我仍在尝试找到从哪里获取最新版本的无头铬二进制文件。 When I find it it'll post here.
当我找到它时,它会张贴在这里。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.