简体   繁体   English

托管节点 web 爬虫?

[英]Hosting a node web crawler?

I've got a crawler which checks a list of URLs every 60s written in Nodejs.我有一个爬虫,它每 60 秒检查一次用 Nodejs 编写的 URL 列表。 It uses no database, stores a few items in-memory and should run 24/7.它不使用数据库,在内存中存储一些项目并且应该 24/7 全天候运行。

What's a proper solution for hosting this crawler?托管此爬虫的正确解决方案是什么?

As far as I've understood AWS it's paid per second which would make a 24/7 process pretty expensive I guess?据我所知,AWS 是每秒付费的,我猜这会使 24/7 的流程非常昂贵? Or maybe I'm missing something here, the AWS docs are pretty confusing imo.或者也许我在这里遗漏了一些东西,AWS 文档在我看来非常混乱。

The tool sounds light enough (based on the purpose), so I'd go for a serverless solution to reduce operations footprint: so go for a Lambda function or ECS FarGate.该工具听起来足够轻巧(基于目的),所以我希望 go 用于无服务器解决方案以减少操作足迹:因此 go 用于 Lambda function 或 ECS FarGate。 Here's what you'd expect to pay:以下是您期望支付的费用:

For Lambda, assuming 512MB running for 5 seconds @ 0.0000008333 per 100ms: 60 calls * 24 hours * 30 days -> 43200 * 0.0000008333 50* ~= $1.8 per month对于 Lambda,假设 512MB 运行 5 秒 @ 0.0000008333 每 100 毫秒:60 次调用 * 24 小时 * 30 天 -> 43200 * 0.0000008333 50* ~=每月 1.8 美元

https://s3.amazonaws.com/lambda-tools/pricing-calculator.html https://s3.amazonaws.com/lambda-tools/pricing-calculator.html

For Fargate at the smallest footprint: 0.25 VCPU and 0.5GB memory: ((0.25 * 0.01239249) + (0.5 * 0.00136079)) * 24 * 30 ~= $2.7 per month对于占用空间最小的 Fargate:0.25 VCPU 和 0.5GB memory: ((0.25 * 0.01239249) + (0.5 * 0.00136079)) * 24 * 30 ~=每月 2.7 美元

Use caution with those numbers, just a quick draft.谨慎使用这些数字,只是一个快速草稿。 Both options are fairly cheap, but Lambda might be easier to work with plus you indicated that you don't need the items in memory to persist calls.这两个选项都相当便宜,但 Lambda 可能更容易使用,而且您表示不需要 memory 中的项目来保持通话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM