I'm trying to scrape some websites, but for some reason it works locally (localhost) with express, but not when I've deployed it to lambda. Tried w/ the ff serverless-http and aws-serverless-express and serverless-express plugin. Also tried switching between axios and superagent.
Routes work fine, and after hrs of investigating, I've narrowed the problem down to the fetch/axios bit. When i don't add a timeout to axios/superagent/etc, the app just keeps running and timing out at 15/30 sec, whichever is set and get an error 50*.
service: scrape
provider:
name: aws
runtime: nodejs10.x
stage: dev
region: us-east-2
memorySize: 128
timeout: 15
plugins:
- serverless-plugin-typescript
- serverless-express
functions:
app:
handler: src/server.handler
events:
- http:
path: /
method: ANY
cors: true
- http:
path: /{proxy+}
method: ANY
cors: true
protected async fetchHtml(uri: string): Promise<CheerioStatic | null> {
const htmlElement = await Axios.get(uri, { timeout: 5000 });
if(htmlElement.status === 200) {
const $ = Cheerio.load(htmlElement && htmlElement.data || '');
$('script').remove();
return $;
}
return null;
}
As far as i know, the default timeout of axios is indefinite. Remember, API gateway has hard limit of 29 sec timeout.
I had the same issue recently, sometimes the timeouts are due to cold starts. So I basically had to add a retry logic for the api call in my frontend react application.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.