简体繁体中英

Kinesis Stream and Kinesis Firehose Updating Elasticsearch Indexes

原文 2018-07-24 15:12:55 0 1 amazon-web-services/ elasticsearch/ amazon-kinesis/ amazon-kinesis-firehose

We want to use kinesis stream and firehose to update an aws managed elasticsearch cluster. We have hundreds of different indexes (corresponding to our DB shards) that need to be updated. When creating the firehose it requires that I specify the specific index name I want updated. Does that mean I need to create a separate firehose for each index in my cluster? Or is there a way to configure the firehose so it knows what index to used based on the content of the data.

Also, we would have 20 or so separate producers that would send data to a kinesis stream (each one of these producers would generate data for 10 different indexes). Would I also need a separate kinesis stream for each producer.

Summary: 20 producers (EC2 instances) -> Each producer sends data for 20 different indexes to a kinesis stream -> The kinesis stream then uses a firehose to update a single cluster which has 200 indexes in it.

Note: all of the indexes have the same mapping and name temple ie index_1, index_2...index_200

Edit: As we reindex the data we create new indexes along the lines of index_1-v2. Obviously we won't want to create a new firehose for each index version as they're being created. The new index name can be included in the JSON that's sent to the kinesis stream.

1 answers

As you guessed, Firehose is the wrong solution for this problem, at least as stated. It is designed for situations where there's a 1:1 correspondence between stream (not producer!) and index. Things like clickstream data or log aggregation.

For any solution, you'll need to provide a mechanism to identify which index a record belongs to. You could do this by creating a separate Kinesis stream per message type (in which case you could use Firehose), but this would mean that your producers have to decide which stream to write each message to. That may cause unwanted complexity in your producers, and may also increase your costs unacceptably.

So, assuming that you want a single stream for all messages, you need a consumer application and some way to group those messages. You could include a message type (/ index name) in the record itself, or use the partition key for that purpose. The partition key makes for a somewhat easier implementation, as it guarantees that records for the same index will be stored on the same shard, but it means that your producers may be throttled.

For the consumer, you could use an always-on application that runs on EC2, or have the stream invoke a Lambda function .

Using Lambda is nice if you're using partition key to identify the message type, because each invocation only looks at a single shard (you may still have multiple partition keys in the invocation). On the downside, Lambda will poll the stream once per second, which may result in throttling if you have multiple stream consumers (with a stand-alone app you can control how often it polls the stream).

Auto wire kinesis stream to kinesis firehose?

How does kinesis firehose stream data to self managed elasticsearch?

AWS Kinesis Firehose to ElasticSearch Geo data mapping

AWS: reading Kinesis Stream data using Kinesis Firehose in a different account

Kinesis Data Firehose source `Direct PUT` vs `Kinesis Data Stream`

AWS Kinesis Firehose - using Index Rotation (Elasticsearch)

Call Kinesis Firehose vs Kinesis Stream directly from Lambda

Kinesis Stream to S3 Backup using Firehose

Ordering of streaming data with kinesis stream and firehose

Writing to S3 via Kinesis Stream or Firehose

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Auto wire kinesis stream to kinesis firehose? How does kinesis firehose stream data to self managed elasticsearch? AWS Kinesis Firehose to ElasticSearch Geo data mapping AWS: reading Kinesis Stream data using Kinesis Firehose in a different account Kinesis Data Firehose source `Direct PUT` vs `Kinesis Data Stream` AWS Kinesis Firehose - using Index Rotation (Elasticsearch) Call Kinesis Firehose vs Kinesis Stream directly from Lambda Kinesis Stream to S3 Backup using Firehose Ordering of streaming data with kinesis stream and firehose Writing to S3 via Kinesis Stream or Firehose

Related Tags

Kinesis Stream and Kinesis Firehose Updating Elasticsearch Indexes

Question

1 answers

solution1 0 2018-07-24 15:40:13

solution1
0 2018-07-24 15:40:13