简体   繁体   English

使用AWS Lambda和Redshift的ETL

[英]ETL using AWS Lambda and Redshift

Background: I'd like to collect data from an existing system to perform Analytics processing. 背景:我想从现有系统收集数据以执行分析处理。

The existing system exposes REST endpoint. 现有系统公开REST端点。

Hard Requirement: MVP (minimum viable product) => preferred AWS Lambda or something light and should end in Redshift, no extra storage/archival is required (no need to store in S3). 硬性要求:MVP(最低可行产品)=>首选AWS Lambda或轻量级产品,应以Redshift结束,无需额外存储/存档(无需存储在S3中)。

My plan is to use AWS Lambda to perform interval collection and do transformation and store the data to AWS Redshift. 我的计划是使用AWS Lambda执行区间收集并进行转换并将数据存储到AWS Redshift。

What are the suggested approach: 建议的方法是什么:

Soln #1: AWS Lambda for transformation + use PostgreSQL driver to insert? Soln#1:用于转换的AWS Lambda +使用PostgreSQL驱动程序插入?

Soln #2: AWS Lambda for transformation + push to AWS Kinesis => copy to AWS Redshift? Soln#2:用于转换的AWS Lambda +推送到AWS Kinesis =>复制到AWS Redshift?

Any other solutions? 还有其他方法吗?

What is the data volume that you need to ingest into Redshift? 您需要摄入Redshift的数据量是多少? Let say you schedule Lambda to run every 30 min, get a batch of data & insert into Redshift, make sure it can complete within 15min execution time. 假设您安排Lambda每30分钟运行一次,获取一批数据并插入Redshift,确保它可以在15分钟的执行时间内完成。

I prefer Lambda --> Kinesis Firehose --> Redshift as it can scale better. 我更喜欢Lambda - > Kinesis Firehose - > Redshift,因为它可以更好地扩展。 But if the volume is small or cost is a factor then you #1 is also a good choice. 但如果音量很小或成本是一个因素,那么#1也是一个不错的选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM