简体繁体 English

Kinesis如何实现Kafka风格的消费者群体？

[英]How does Kinesis achieve Kafka style Consumer Groups?

原文 2018-05-05 14:31:58 7 1 apache-kafka/ kafka-consumer-api/ amazon-kinesis

In Kafka, I can split my topic into many partitions. 在Kafka中，我可以将我的主题分成许多分区。 I cannot have more consumers than partitions in Kafka, because the partition is used as a way to scale out a topic. 我不能拥有比Kafka中的分区更多的消费者，因为分区用作扩展主题的一种方式。 If I have more load, I can increase the number of partitions, which will allow me to increase the number of consumers, which will allow me to have more threads / processes processing on a given topic. 如果我有更多的负载，我可以增加分区的数量，这将允许我增加消费者的数量，这将允许我在给定主题上有更多的线程/进程处理。

In Kafka, there is a concept of a Consumer Group. 在卡夫卡，有一个消费者群体的概念。 If we have 10 consumer groups on a single topic, each consumer group will have the opportunity to process every message in a topic. 如果我们在一个主题上有10个消费者组，则每个消费者组将有机会处理主题中的每条消息。 The consumer group still takes advantage of the scalability from the partitions (ie Each consumer group can have up to 'n' consumers, where 'n' is the number of partitions on a topic). 消费者群体仍然利用分区的可扩展性（即每个消费者群体可以拥有最多'n'个消费者，其中'n'是主题上的分区数量）。 This is the beauty of kafka, scalability and multi-channel reading are two separate concepts with two separate knobs to turn. 这是卡夫卡的美感，可扩展性和多声道阅读是两个独立的概念，有两个单独的旋钮可以转动。

In Kinesis, we are told that, if you use the Kinesis Library Client you can get the same functionality as consumer groups by defining different Kinesis Applications. 在Kinesis中，我们被告知，如果您使用Kinesis Library Client，您可以通过定义不同的Kinesis应用程序获得与使用者组相同的功能。 In other words, we can have different Kinesis Applications independently streaming all records from the same stream and different times. 换句话说，我们可以让不同的Kinesis应用程序独立地流式传输来自同一流和不同时间的所有记录。

We are also told that "Amazon Kinesis Client Library (KCL) automatically creates an Amazon DynamoDB table for each Amazon Kinesis Application to track and maintain state information such as resharding events and sequence number checkpoints." 我们还被告知“Amazon Kinesis客户端库（KCL）会自动为每个Amazon Kinesis应用程序创建一个Amazon DynamoDB表，以跟踪和维护状态信息，例如重新分片事件和序列号检查点。”

OK, So I'm getting ready to start reading through the KCL code here , but I'm hoping someone can answer these questions to save me some time. 好的，所以我准备开始阅读这里的KCL代码了，但是我希望有人可以回答这些问题，以节省一些时间。

How does the KCL actually do this? KCL如何实际做到这一点？
Are there diagrams somewhere explaining the process? 有没有解释过程的图表？
If I started a new Kinesis Application (MyKinesisApp1) after a record was already produced and consumed by all prior Kinesis Applications, will the new Kinesis Application (MyKinesisApp1) still have an opportunity to consume that record? 如果我在所有先前的Kinesis应用程序已经生成并使用了记录后启动了新的Kinesis应用程序（MyKinesisApp1），那么新的Kinesis应用程序（MyKinesisApp1）是否仍有机会使用该记录？ In other words, does Kinesis remove the record from its stream after it has been processed, or does it leave it there for the 7 days no matter what? 换句话说，Kinesis在处理完成后会从其流中删除记录，还是将记录留在那里7天，无论如何？

I have seen this question here but it doesn't answer my question. 我在这里看到了这个问题，但它没有回答我的问题。 Especially my third question! 特别是第三个问题！ Also, this question does a direct comparison between two similar technologies. 此外，这个问题直接比较了两种类似的技术。 It will help people that know Kafka, learn Kinesis more quickly. 它将帮助了解卡夫卡的人，更快地学习Kinesis。

1 个解决方案

In the KCL configuration, there is a section "appName" which corresponds to "Application Name" and that is the same as "consumer group" in Kafka. 在KCL配置中，有一个“appName”部分，它对应于“Application Name”，与Kafka中的“consumer group”相同。 For each consumer group (ie. Kinesis Streams Consumer Application) there is a DynamoDB table. 对于每个使用者组（即Kinesis Streams Consumer Application），都有一个DynamoDB表。 You can see an example DynamoDB here (the KCL appName is 'quickstats-development'): AWS Kinesis leaseOwner confusion 你可以在这里看到一个示例DynamoDB（KCL appName是'quickstats-development'）： AWS Kinesis leaseOwner confusion
No, as far as I know, there is not. 不，据我所知，没有。 "Kinesis Streams" is similar to Kafka, but other than that, not much graphical representation. “Kinesis Streams”类似于Kafka，但除此之外，没有太多的图形表示。
Yes. 是。 Each Kafka Consumer-Group is represented as a different DynamoDB table in Kinesis. 每个Kafka Consumer-Group都表示为Kinesis中的不同DynamoDB表。 That way, different Kinesis Consumer Applications can consume same record independently. 这样，不同的Kinesis Consumer Applications可以独立地使用相同的记录。 The checkpoint in Kinesis is the Offset value of Kafka. Kinesis中的检查点是Kafka的Offset值。 And a checkpoint in DynamoDB is the cursor of reading point in a Kinesis shard. DynamoDB中的检查点是Kinesis分片中读取点的光标。 Read this answer for a similar example: https://stackoverflow.com/a/42833193/1622134 阅读此答案的类似示例： https ： //stackoverflow.com/a/42833193/1622134