简体   繁体   中英

Why does Selenium with Chrome driver work locally but crashes on AWS Lambda?

I am writing a small application to scrape websites in Python, that I want to package into a container and deploy on AWS Lambda.

I wrote a Docker set up that works well when I test it locally (following the guide ). However when I deploy it on AWS, Chrome fails to start when Selenium launches it. The error message is not very insightful:

[1670327680.879][INFO]: Launching chrome: /opt/chrome/google-chrome --allow-pre-commit-input --allow-running-insecure-content --data-path=/tmp/data-path --disable-accelerated-2d-canvas --disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-gpu --disable-hang-monitor --disable-ipv6 --disable-notifications --disable-popup-blocking --disable-prompt-on-repost --disable-setuid-sandbox --disable-sync --disable-web-security --disk-cache-dir=/tmp/cache-dir --enable-automation --enable-blink-features=ShadowDOMV0 --enable-logging --headless --hide-scrollbars --homedir=/tmp --ignore-certificate-errors --lang=en-GB --log-level=0 --mute-audio --no-cache --no-first-run --no-sandbox --no-service-autorun --password-store=basic --remote-debugging-port=0 --start-maximized --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/user-data --v=99 --window-size=1472,828
[1670327682.287][SEVERE]: CreatePlatformSocket() failed: Address family not supported by protocol (97)
[1670327682.287][SEVERE]: CreatePlatformSocket() failed: Address family not supported by protocol (97)
[1670327688.346][INFO]: [d53e8f7697487d8804187ea37ebb32ea] RESPONSE InitSession ERROR unknown error: Chrome failed to start: crashed.

Since it worked on my local tests, I excluded all problems due to versions, dependencies and so on. Looking online, the only difference I could find between my local environment and the Lambda one are the filesystem permissions, and possibly the support of IPv6. I tried to correct that with the options passed to the chrome, but it did not help. I have also tried to put the chrome installation into the /tmp directory, as suggested in another similar question, but it did not work either.

I am installing Chrome and the Chrome driver with this script:

chrome_versions=( ['109.0.5414.25']='1070081' )
chrome_drivers=( "109.0.5414.25" )

for br in "${!chrome_versions[@]}"
do
echo "Downloading Chrome version $br"
mkdir -p "/opt/chrome/$br"
curl -Lo "/opt/chrome/$br/chrome-linux.zip" "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F$%7Bchrome_versions%5B$br%5D%7D%2Fchrome-linux.zip?alt=media"
unzip -q "/opt/chrome/$br/chrome-linux.zip" -d "/opt/chrome/$br/"
mv /opt/chrome/$br/chrome-linux/\* /opt/chrome/
ln -s /opt/chrome/chrome /opt/chrome/google-chrome
rm -rf /opt/chrome/$br/chrome-linux "/opt/chrome/$br/chrome-linux.zip"
done

# Download Chromedriver

for dr in ${chrome_drivers[@]}
do
echo "Downloading Chromedriver version $dr"
mkdir -p "/opt/chromedriver/$dr"
curl -Lo "/opt/chromedriver/$dr/chromedriver_linux64.zip" "https://chromedriver.storage.googleapis.com/$dr/chromedriver_linux64.zip"
unzip -q "/opt/chromedriver/$dr/chromedriver_linux64.zip" -d "/opt/chromedriver/"
chmod +x "/opt/chromedriver/chromedriver"
rm -rf "/opt/chromedriver/$dr/chromedriver_linux64.zip"
done

For the purpose of testing, I am launching the driver with

from selenium import webdriver
driver = webdriver.Chrome(options=options,service_log_path='/tmp/chromedriver.log')
driver.get("https://www.selenium.dev/selenium/web/web-form.html")

You can find the whole project here (WIP).

What is frustrating for me is that I cannot seem to get any information into the nature of the crash, so at this point I am just blindly guessing. Can you give me any tips on how to debug this?

I suspect the Lambda runtime environment is limiting.network traffic to the service you are Chrome instance you are trying to run simultaneously.

I would recommend running the container image on ECS/Fargate, rather than running the image inside the Lambda runtime environment. While Lambda can run containers, it does things a little bit differently and it sounds like what you are trying to do fits better with something closer to running a container on your laptop, than trying to get it to work inside Lambda. Using Fargate, you still won't have to worry about configuring the underlying infrastructure so it is close to Lambda-level complexity.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM