简体繁体 English

使用AWS Lambda和Redshift的ETL

[英]ETL using AWS Lambda and Redshift

原文 2019-06-17 23:49:02 4 1 amazon-web-services/ aws-lambda/ amazon-kinesis

Background: I'd like to collect data from an existing system to perform Analytics processing. 背景：我想从现有系统收集数据以执行分析处理。

The existing system exposes REST endpoint. 现有系统公开REST端点。

Hard Requirement: MVP (minimum viable product) => preferred AWS Lambda or something light and should end in Redshift, no extra storage/archival is required (no need to store in S3). 硬性要求：MVP（最低可行产品）=>首选AWS Lambda或轻量级产品，应以Redshift结束，无需额外存储/存档（无需存储在S3中）。

My plan is to use AWS Lambda to perform interval collection and do transformation and store the data to AWS Redshift. 我的计划是使用AWS Lambda执行区间收集并进行转换并将数据存储到AWS Redshift。

What are the suggested approach: 建议的方法是什么：

Soln #1: AWS Lambda for transformation + use PostgreSQL driver to insert? Soln＃1：用于转换的AWS Lambda +使用PostgreSQL驱动程序插入？

Soln #2: AWS Lambda for transformation + push to AWS Kinesis => copy to AWS Redshift? Soln＃2：用于转换的AWS Lambda +推送到AWS Kinesis =>复制到AWS Redshift？

Any other solutions? 还有其他方法吗？

1 个解决方案

What is the data volume that you need to ingest into Redshift? 您需要摄入Redshift的数据量是多少？ Let say you schedule Lambda to run every 30 min, get a batch of data & insert into Redshift, make sure it can complete within 15min execution time. 假设您安排Lambda每30分钟运行一次，获取一批数据并插入Redshift，确保它可以在15分钟的执行时间内完成。

I prefer Lambda --> Kinesis Firehose --> Redshift as it can scale better. 我更喜欢Lambda - > Kinesis Firehose - > Redshift，因为它可以更好地扩展。 But if the volume is small or cost is a factor then you #1 is also a good choice. 但如果音量很小或成本是一个因素，那么＃1也是一个不错的选择。