简体   繁体   English

Kinesis Stream和Kinesis Firehose更新Elasticsearch索引

[英]Kinesis Stream and Kinesis Firehose Updating Elasticsearch Indexes

We want to use kinesis stream and firehose to update an aws managed elasticsearch cluster. 我们要使用运动学流和流水来更新aws管理的Elasticsearch集群。 We have hundreds of different indexes (corresponding to our DB shards) that need to be updated. 我们有数百个需要更新的不同索引(对应于我们的数据库分片)。 When creating the firehose it requires that I specify the specific index name I want updated. 创建Firehose时,要求我指定要更新的特定索引名称。 Does that mean I need to create a separate firehose for each index in my cluster? 这是否意味着我需要为集群中的每个索引创建一个单独的firehose? Or is there a way to configure the firehose so it knows what index to used based on the content of the data. 或者有没有一种配置firehose的方法,以便它根据数据的内容知道要使用哪个索引。

Also, we would have 20 or so separate producers that would send data to a kinesis stream (each one of these producers would generate data for 10 different indexes). 而且,我们将有20个左右的生产者,它们将数据发送到运动学流(这些生产者中的每个生产者将为10个不同的索引生成数据)。 Would I also need a separate kinesis stream for each producer. 对于每个生产者,我是否还需要单独的运动学流。

Summary: 20 producers (EC2 instances) -> Each producer sends data for 20 different indexes to a kinesis stream -> The kinesis stream then uses a firehose to update a single cluster which has 200 indexes in it. 简介:20个生产者(EC2实例)->每个生产者将20个不同索引的数据发送到kinesis流->然后,kinesis流使用流水来更新其中具有200个索引的单个群集。

Note: all of the indexes have the same mapping and name temple ie index_1, index_2...index_200 注意:所有索引都有相同的映射和名称,即index_1,index_2 ... index_200

Edit: As we reindex the data we create new indexes along the lines of index_1-v2. 编辑:当我们重新索引数据时,我们沿着index_1-v2的行创建新索引。 Obviously we won't want to create a new firehose for each index version as they're being created. 显然,我们不希望在创建每个索引版本时为其创建新的firehose。 The new index name can be included in the JSON that's sent to the kinesis stream. 新的索引名称可以包含在发送到kinesis流的JSON中。

As you guessed, Firehose is the wrong solution for this problem, at least as stated. 如您所料,Firehose至少是针对此问题的错误解决方案。 It is designed for situations where there's a 1:1 correspondence between stream (not producer!) and index. 它设计用于流(不是生产者!)和索引之间存在1:1对应的情况。 Things like clickstream data or log aggregation. 诸如点击流数据或日志聚合之类的东西。

For any solution, you'll need to provide a mechanism to identify which index a record belongs to. 对于任何解决方案,您都需要提供一种机制来识别记录属于哪个索引。 You could do this by creating a separate Kinesis stream per message type (in which case you could use Firehose), but this would mean that your producers have to decide which stream to write each message to. 您可以通过为每种消息类型创建单独的Kinesis流来实现此目的(在这种情况下,您可以使用Firehose),但这意味着您的生产者必须决定将每条消息写入哪个流。 That may cause unwanted complexity in your producers, and may also increase your costs unacceptably. 这可能会导致生产者不必要的复杂性,也可能使您的成本增加到无法接受的程度。

So, assuming that you want a single stream for all messages, you need a consumer application and some way to group those messages. 因此,假设您要为所有消息提供单个流,则需要使用方应用程序和某种将这些消息分组的方法。 You could include a message type (/ index name) in the record itself, or use the partition key for that purpose. 您可以在记录本身中包含消息类型(/索引名称),或为此使用分区键。 The partition key makes for a somewhat easier implementation, as it guarantees that records for the same index will be stored on the same shard, but it means that your producers may be throttled. 分区键可以使实现起来更容易一些,因为它可以确保将相同索引的记录存储在同一分片上,但这意味着您的生产者可能受到限制。

For the consumer, you could use an always-on application that runs on EC2, or have the stream invoke a Lambda function . 对于使用者,您可以使用在EC2上运行的永远在线应用程序,或者让流调用Lambda函数

Using Lambda is nice if you're using partition key to identify the message type, because each invocation only looks at a single shard (you may still have multiple partition keys in the invocation). 如果您使用分区键来标识消息类型,那么使用Lambda会很不错,因为每个调用仅查看一个分片(调用中可能仍具有多个分区键)。 On the downside, Lambda will poll the stream once per second, which may result in throttling if you have multiple stream consumers (with a stand-alone app you can control how often it polls the stream). 不利的一面是,Lambda每秒轮询一次流,如果您有多个流使用者,则可能会导致节流(使用独立应用程序,您可以控制轮询流的频率)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM