简体繁体中英

Populate dynamodb table from kinesis stream/firehose

原文 2019-07-05 10:31:16 1 1 amazon-web-services/ amazon-dynamodb/ amazon-kinesis

Problem

What is the recommended way to populate a dynamodb table with data coming from a kinesis datasource (stream or firehose)?

Current workflow

Data is ingested into kinesis firehose
lambda triggers on every record written to kinesis firehose and sends the data to dynamodb

Why

I would like to get some advice on this because

I am not sure if this approach isn't creating more work than necessary. Ie I need to write and maintain code for the lambda
I see that I can configure the likes of redshift or s3 as a consumer of my kinesis datasource. Why can't I do the same with dynamodb? Is there a reason for this? Are other people not using this kind of workflow?

1 answers

My opinion is, your workflow is currently more or less the right way to do it. The only thing I would change is, I would use Kinesis Streams instead of Firehose. You can, then, configure your stream as your Lambda event source and there is an option to configure batch size. This will greatly decrease your lambda costs, because instead of one lambda execution per record, you will have one lambda execution per each batch (size of 500 records for example). Details are explained in AWS documentation ( https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html )

I am not exactly sure about the real reasons behind not providing DynamoDB as a destination. My guess is; Kinesis doesn't know the structure of your content. Current destinations of Kinesis either have some mechanism to structure incoming data for their needs or they dont care at all about the object structure (S3). On the other hand, DynamoDB requires some decisions from the user. And those architectural decisions are highly important for each table (performance, cost, partitioning, access patterns, etc). Which field will be your partition key, will you use a sort key? Will you format any of your fields? How will you make sure your primary key values are unique? What will be the type of each field (String, Decimal, etc)? I think, Lambda is the most suitable mechanism for those decisions because of its flexibility.

There are some automated mechanisms to infer schema from the data itself (like AWS Glue uses), but in DynamoDB case, it is not simple.

DynamoDB table having TTL and triggers Kinesis Firehose

Call Kinesis Firehose vs Kinesis Stream directly from Lambda

Create delivery stream (Firehose) from data stream (Kinesis) to OpenSearch AWS

Stream Data from SQL Server into Redshift with Kinesis Firehose

Kinesis Stream and Kinesis Firehose Updating Elasticsearch Indexes

Auto wire kinesis stream to kinesis firehose?

CloudFront realtime logs from Kinesis data stream to Kinesis Firehose to S3 bucket

Kinesis Stream to S3 Backup using Firehose

Ordering of streaming data with kinesis stream and firehose

Writing to S3 via Kinesis Stream or Firehose

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question DynamoDB table having TTL and triggers Kinesis Firehose Call Kinesis Firehose vs Kinesis Stream directly from Lambda Create delivery stream (Firehose) from data stream (Kinesis) to OpenSearch AWS Stream Data from SQL Server into Redshift with Kinesis Firehose Kinesis Stream and Kinesis Firehose Updating Elasticsearch Indexes Auto wire kinesis stream to kinesis firehose? CloudFront realtime logs from Kinesis data stream to Kinesis Firehose to S3 bucket Kinesis Stream to S3 Backup using Firehose Ordering of streaming data with kinesis stream and firehose Writing to S3 via Kinesis Stream or Firehose

Related Tags

Populate dynamodb table from kinesis stream/firehose

Question

1 answers

solution1 0 2020-01-08 15:48:57

solution1
0 2020-01-08 15:48:57