简体   繁体   English

Kafka-Connect:在分布式模式下创建新连接器就是创建新组

[英]Kafka-Connect: Creating a new connector in distributed mode is creating new group

I am currently working with confluent 3.0.1 platform.我目前正在使用 confluent 3.0.1 平台。 I am trying to create 2 connectors on two different workers but trying to create a new connector is creating a new group for it.我正在尝试在两个不同的工作人员上创建 2 个连接器,但尝试创建一个新的连接器正在为其创建一个新组。

Two connectors were created using below details:

1) POST http://devmetric.com:8083/connectors

{
    "name": "connector1",
    "config": {
        "connector.class": "com.xxx.kafka.connect.sink.DeliverySinkConnector",
        "tasks.max": "1",
        "topics": "dev.ps_primary_delivery",
        "elasticsearch.cluster.name": "ad_metrics_store",
        "elasticsearch.hosts": "devkafka1.com:9300",
        "elasticsearch.bulk.size": "100",
        "tenants": "tenant1"
    }
}

2) POST http://devkafka01.com:8083/connectors

{
    "name": "connector2",
    "config": {
        "connector.class": "com.xxx.kafka.connect.sink.DeliverySinkConnector",
        "tasks.max": "1",
        "topics": "dev.ps_primary_delivery",
        "elasticsearch.cluster.name": "ad_metrics_store",
        "elasticsearch.hosts": "devkafka.com:9300",
        "elasticsearch.bulk.size": "100",
        "tenants": "tenant1"
    }
}

But both of them were created under different group id.但是它们都是在不同的组 ID 下创建的。 After this i queried on the existing groups.在此之后,我查询了现有的组。

$ sh ./bin/kafka-consumer-groups --bootstrap-server devmetric.com:9091  --new-consumer  --list

Result was:
connect-connector2
connect-connector1

These groups was created by Kafka connect automatically and was not given by me.这些组是 Kafka connect 自动创建的,不是我给的。 I had given different group.id in worker.properties.我在 worker.properties 中给出了不同的 group.id。 But I wanted both connectors to be under same group so that they work parallel to share the messages.As of now I have 1 million data on a topic "dev.ps_primary_delivery" and I want both connector to get 0.5 million each.但是我希望两个连接器都在同一个组下,以便它们并行工作以共享消息。截至目前,我在主题“dev.ps_primary_delivery”上有 100 万个数据,我希望两个连接器各获得 50 万个数据。

Please let me know how to do this.请让我知道如何做到这一点。

I think some clarification is required...我认为需要澄清一些......

  1. group.id in the worker.properties file does not refer to consumer groups. group.id文件中的 group.id 不是指消费者组。 It is a "worker group" - multiple workers in the same worker group will split work between them - so if the same connector has many tasks (for example the JDBC connector has a task for every table), those tasks will be allocated to all workers in the group.它是一个“工作组”——同一个工作组中的多个工作人员将在他们之间分配工作——所以如果同一个连接器有很多任务(例如 JDBC 连接器对每个表都有一个任务),这些任务将分配给所有组中的工人。

  2. Sink connectors do have consumers that are part of a consumer group.接收器连接器确实有属于消费者组的消费者。 The group.id of this group is always "connect-"+connector name.该组的 group.id 始终为“connect-”+连接器名称。 In your case, you got "connect-connector1" and "connect-connector2" based on your connector names.在您的情况下,您会根据连接器名称获得“connect-connector1”和“connect-connector2”。 This also means that the only way two connectors will be in the same group is... if they have the same name.这也意味着两个连接器在同一组中的唯一方式是...如果它们具有相同的名称。 But names are unique, so you can't have two connectors in the same group.但是名称是唯一的,因此同一组中不能有两个连接器。 The reason is...原因是...

  3. Connectors don't really get events themselves, they just start a bunch of tasks.连接器本身并没有真正获得事件,它们只是启动一堆任务。 Each of the tasks has consumers that are part of the connector consumer group and each task will handle a subset of the topics and partitions independently.每个任务都有属于连接器消费者组的消费者,每个任务将独立处理主题和分区的子集。 So having two connectors in the same group, basically means that all their tasks are part of the same group - so why do you need two connectors?因此,在同一组中有两个连接器,基本上意味着他们的所有任务都属于同一组 - 那么为什么需要两个连接器? Just configure more topics and more tasks for that one connector and you are all set.只需为该连接器配置更多主题和更多任务,一切就绪。

The only exception is if the connector you are using doesn't use tasks correctly or limits you to just one task.唯一的例外是您使用的连接器未正确使用任务或将您限制为仅执行一项任务。 In that case - either they have a good reason or (more likely) someone needs to improve their connector...在这种情况下 - 他们要么有充分的理由,要么(更有可能)有人需要改进他们的连接器......

You can set consumer.group.id as a value which Kafka Connect can take and use it as the group.id for the entire application您可以将 consumer.group.id 设置为 Kafka Connect 可以采用的值,并将其用作整个应用程序的 group.id

Advantage : you get one consumer group your application connects to Disadvantage : you should be careful on the Consumer Group configurations.优点:您的应用程序连接到一个消费者组 缺点:您应该小心消费者组配置。 Make them all look same让它们看起来都一样

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM