简体繁体 English

跨两个不同的 kafka 连接集群的相同消费者组（s3 接收器连接器）

[英]Same consumer group (s3 sink connector) across two different kafka connect cluster

原文 2022-05-16 23:39:55 8 1 apache-kafka/ kafka-consumer-api/ confluent-platform/ s3-kafka-connector

I'm migrating Kafka connectors from an ECS cluster to a new cluster running on Kube.netes.我正在将 Kafka 连接器从 ECS 集群迁移到在 Kube.netes 上运行的新集群。 I successfully migrated the Postgres source connectors over by deleting them and recreating them on the exact replication slots.我通过删除 Postgres 源连接器并在确切的复制槽上重新创建它们，成功地迁移了它们。 They keep writing to the same topics in the same Kafka cluster.他们不断地在同一个 Kafka 集群中写入相同的主题。 And the S3 connector in the old cluster continues to read from those and write records into S3.旧集群中的 S3 连接器继续从这些数据中读取并将记录写入 S3。 Everything works as usual.一切正常。

But now to move the AWS s3 sink connectors, I first created a non-critical s3 connector in the new cluster with the same name as the one in the old cluster.但是现在要移动 AWS s3 接收器连接器，我首先在新集群中创建了一个与旧集群中的名称相同的非关键 s3 连接器。 I was going to wait a few minutes before deleting the old one to avoid missing data.我打算等几分钟再删除旧的以避免丢失数据。 To my surprise, it looks like (based on the UI provided by akhq.io) the one worker on that new s3 connector joins with the existing same consumer group.令我惊讶的是，看起来（基于 akhq.io 提供的 UI）那个新 s3 连接器上的一名工作人员加入了现有的同一消费者组。 I was fully expecting to have duplicated data.我完全期待有重复的数据。 Based on the Confluent doc ,基于Confluent 文档，

All Workers in the cluster use the same three internal topics to share connector configurations, offset data, and status updates.集群中的所有 Worker 使用相同的三个内部主题来共享连接器配置、偏移数据和状态更新。 For this reason all distributed worker configurations in the same Connect cluster must have matching config.storage.topic, offset.storage.topic, and status.storage.topic properties.因此，同一个 Connect 集群中的所有分布式 worker 配置都必须具有匹配的 config.storage.topic、offset.storage.topic 和 status.storage.topic 属性。

So from this "same Connect cluster", I thought having the same consumer group id only works within the same connect cluster.因此，从这个“相同的 Connect 集群”中，我认为具有相同的消费者组 ID 只能在相同的连接集群中工作。 But from my observation, it seems like you could have multiple consumers in different clusters belonging to the same consumer group?但是根据我的观察，您似乎可以在属于同一消费者组的不同集群中拥有多个消费者？

Based on this article __consumer_offsets is used by consumers, and unlike other hidden "offset" related topics, it doesn't have any cluster name designation.基于本文__consumer_offsets由消费者使用，与其他隐藏的“偏移量”相关主题不同，它没有任何集群名称指定。

Does that mean I could simply create S3 sink connectors in the new Kube.netes cluster and then delete the ones in the ECS cluster without duplicating or missing data then (as long as they have the same name -> same consumer group)?这是否意味着我可以简单地在新的 Kube.netes 集群中创建 S3 接收器连接器，然后删除 ECS 集群中的连接器，而不会重复或丢失数据（只要它们具有相同的名称 -> 相同的消费者组）？ I'm not sure if this is the right pattern people usually use.我不确定这是否是人们通常使用的正确模式。

1 个解决方案

I'm not familiar with using a Kafka Connect Cluster but I understand that it is a cluster of connectors that is independent of the Kafka cluster.我不熟悉使用 Kafka Connect 集群，但我知道它是一个独立于 Kafka 集群的连接器集群。

In that case, since the connectors are using the same Kafka cluster and you are just moving them from ECS to k8s, it should work as you describe.在那种情况下，由于连接器使用相同的 Kafka 集群，而您只是将它们从 ECS 移动到 k8s，因此它应该按照您的描述工作。 The consumer offsets information and the internal kafka connect offsets information is stored in the Kafka cluster, so it doesn't really matter where the connectors run as long as they connect to the same Kafka cluster.消费者偏移量信息和内部 kafka 连接偏移量信息存储在 Kafka 集群中，因此连接器在何处运行并不重要，只要它们连接到同一个 Kafka 集群即可。 They should restart from the same position or behave as additional replicas of the same connector regardless of where ther are running.它们应该从同一个 position 重新启动，或者作为同一个连接器的附加副本，而不管它们在哪里运行。