简体   繁体   English

为什么 pyppeteer 需要这么长时间才能在 AWS Lambda 上加载单个网页

[英]Why does pyppeteer take such a long time to load a single webpage on AWS Lambda

I am currently trying to crawl MVN Repository using puppeteer on AWS Lambda. However, my test function would run for 15 minutes and proceed to fail after that (See below).我目前正在尝试使用 AWS Lambda 上的 puppeteer 抓取MVN 存储库。但是,我的测试 function 将运行 15 分钟,然后继续失败(见下文)。 It seems like the browser is opened but it doesn't crawl.好像打开了浏览器,但没有抓取。

Here is my current code:这是我当前的代码:

import json
import asyncio
from pyppeteer import launch
import pyppeteer
import zipfile
import boto3
import time
# import pandas as pd
import os
import logging
import subprocess
from pyppeteer.launcher import Launcher

logger = logging.getLogger()
logger.setLevel(logging.INFO)

pyppeteer.DEBUG = True

async def main(name, url):
    browser = await launch(headless=True, args=["--no-sandbox"], executablePath="/opt/python/headless-chromium")
    page = await browser.newPage()
    await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36')
    await page.goto(url)


def lambda_handler(event, context):
    asyncio.get_event_loop().run_until_complete(main('lol','https://mvnrepository.com/artifact/com.adobe.xmp/xmpcore'))
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

The layers for this are:为此的图层是:

The following is the output after the function has timed out:下面是function超时后的output:

Test Event Name
dd

Response
{
  "errorMessage": "2022-04-22T06:28:32.470Z e9be66b9-1fd0-4df9-a0b4-9815067169cd Task timed out after 900.10 seconds"
}

Function Logs
START RequestId: e9be66b9-1fd0-4df9-a0b4-9815067169cd Version: $LATEST
[INFO]  2022-04-22T06:13:32.424Z    e9be66b9-1fd0-4df9-a0b4-9815067169cd    Found credentials in environment variables.
[I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:51625/devtools/browser/1651a2a3-9b53-4f0a-883f-4850a6d693ed
END RequestId: e9be66b9-1fd0-4df9-a0b4-9815067169cd
REPORT RequestId: e9be66b9-1fd0-4df9-a0b4-9815067169cd  Duration: 900104.69 ms  Billed Duration: 900000 ms  Memory Size: 10240 MB   Max Memory Used: 364 MB Init Duration: 490.52 ms    
2022-04-22T06:28:32.470Z e9be66b9-1fd0-4df9-a0b4-9815067169cd Task timed out after 900.10 seconds

Request ID
e9be66b9-1fd0-4df9-a0b4-9815067169cd

Apart from the method I tried earlier, I also followed the following tutorials but to no avail:除了我之前试过的方法外,我还按照以下教程进行操作,但无济于事:

PS I am able to run the above script with no issues on my localhost PS 我可以在我的本地主机上毫无问题地运行上面的脚本

I built a similar configuration but using pyppeteer 1.0.2.我构建了一个类似的配置,但使用的是 pyppeteer 1.0.2。 When I tried to generate a PDF file from the URL you mentioned (mvnrepository), I got an ugly captcha issue: screen .当我尝试从您提到的 URL (mvnrepository) 生成一个 PDF 文件时,我遇到了一个难看的验证码问题: screen Have you tried crawling other websites?您是否尝试过抓取其他网站? This could be the problem.这可能是问题所在。

Please let me know if you found a workaround.如果您找到解决方法,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS 的部署时间长 lambda function 具有巨大的依赖性 - Long deploy time of AWS lambda function with huge dependencies 为什么 settimeout 在我的 aws lambda 中导致 502? - Why does settimeout cause a 502 in my aws lambda? 加载 Keras model AWS Lambda - Load Keras model with AWS Lambda AWS Amazon Inspector 完成一次完整的 EC2 扫描需要多长时间? - How long does it take for AWS Amazon Inspector to complete a full EC2 Scan? 为什么带有 Chrome 驱动程序的 Selenium 在本地工作但在 AWS Lambda 上崩溃? - Why does Selenium with Chrome driver work locally but crashes on AWS Lambda? 使用 AWS 网关长时间运行的作业 - Lambda - RDS - Long running jobs with AWS Gateway - Lambda - RDS 为什么在通过 API 网关调用时,Java 中的 AWS Lambda 代码返回“内部服务器错误”? - why does this AWS Lambda code in Java return "internal server error" when invoked via an API gateway? 为什么我的 Python 应用程序总是在 AWS lambda 上冷启动两次? - Why does my Python app always cold start twice on AWS lambda? 如何在AWS的特定时间触发Lambda function? - How to trigger a Lambda function at specific time in AWS? 如何在 AWS Lambda 中加载 npm 模块? - How to load npm modules in AWS Lambda?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM