简体   繁体   English

Kafka Connect(独立)将数据写入多个分区

[英]Kafka connect (standalone) writing data to multiple partitions

I'm trying to use Kafka connect to write data using the standalone mode. 我正在尝试使用Kafka connect使用独立模式写入数据。 The topic that I'm writing the data to, is having multiple partitions. 我正在将数据写入的主题是具有多个分区。 However, the data is being written to only one of the partitions. 但是,数据仅被写入分区之一。 When I start multiple consumer consoles, the data is printed to only one of them. 当我启动多个用户控制台时,数据仅打印到其中一个。 The other consumer console get any data only after the 1st one is closed. 另一个使用者控制台仅在第一个使用者控制台关闭后才能获取任何数据。 I am not able to figure out what change do I need to make in the configuration file to make it write to multiple partitions. 我无法弄清楚我需要在配置文件中进行哪些更改才能使其写入多个分区。

Here is the standalone.properties 这是standalone.properties

bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true

internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false

offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=1000
rest.port=8084

connect-file-source.properties: connect-file-source.properties:

name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test4.txt
topic=consumer_group

Now I'm using the following command to run the connector: 现在,我使用以下命令来运行连接器:

bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties

Using the following to start consumer consoles: 使用以下命令启动使用者控制台:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic consumer_group --from-beginning --consumer-property group.id=new-consumer-group

It keeps printing data to one of the consumer consoles only. 它仅将数据打印到用户控制台之一。 However, if I use a producer console instead of Kafka connect to write messages, then I can see the messages on multiple consumers (in a round robin fashion), the way it should be. 但是,如果我使用生产者控制台而不是Kafka connect来编写消息,那么我可以按原样在多个使用者(以循环方式)上看到消息。 But using Kafka connect, it is only writing all the data to single partition and other consumers in the same group have to sit idle. 但是使用Kafka connect,它仅将所有数据写入单个分区,并且同一组中的其他使用者必须处于空闲状态。 What needs to be changed to make it write to all partitions in round robin system? 需要进行哪些更改才能将其写入循环系统中的所有分区?

This answer applies to Apache Kafka 0.10.2.1, but may not necessarily apply to future versions. 该答案适用于Apache Kafka 0.10.2.1,但不一定适用于将来的版本。

As you may know, the file source connector generates messages with a null key and null topic partition number. 您可能知道,文件源连接器会生成带有null键和null主题分区号的消息。 That means it is up to Kafka Connect's producer to assign a topic partition using it's partitioner , and for messages with a null key the default partitioner will attempt to round-robin the messages to the available partitions. 这意味着由Kafka Connect的生产者使用其分区程序分配主题分区,对于带有空键的消息, 默认分区程序将尝试将消息循环到可用分区。

However , you're running into one of the quirks of the JSON converter, which is configured in the standalone.properties file via the key.converter and value.converter properties: 但是 ,您遇到了JSON转换器的一个怪癖之一,它是通过key.convertervalue.converter属性在standalone.properties文件中配置的:

key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true

When the JSON Converter is configured to enable schemas, then the JSON representation includes an envelope around the value so that the key or value contain both the schema and payload : 当将JSON Converter配置为启用架构时,JSON表示将在值周围包含一个信封,以便键或值同时包含架构和有效负载

{
    "schema": ...,
    "payload": ...
}

Your standalone.properties file configures the key's converter with schemas enabled , so even though the connector generates messages with null keys and null schemas, the JSON converter (with schemas enabled) always wraps these in an envelope. 您的standalone.properties文件在启用架构的情况下配置了密钥的转换器,因此,即使连接器生成具有null密钥和null架构的消息,JSON转换器(启用了模式)也总是将它们包装在信封中。 Thus, every message's key will be: 因此,每条消息的密钥将是:

{
    "schema": null,
    "payload": null
}

The producer's default partitioner will always hash these identical keys to the same partition . 生产者的默认分区程序将始终将这些相同的密钥散列到同一分区

To change the behavior, edit your standalone.properties files and change the key.converter.schemas.enable property to false : 要更改行为,请编辑standalone.properties文件,并将key.converter.schemas.enable属性更改为false

key.converter.schemas.enable=false

You can optionally change the value.converter.schemas.enable property to false to change how the value is written to not wrap the value in the envelope and include the schema: 您可以选择将value.converter.schemas.enable属性更改为false以更改写入值的方式,从而不将值包装在信封中并包含架构:

value.converter.schemas.enable=false

This also plays into how the converters deal with null values , which some connectors generate when the source entity with a particular key is removed. 这也涉及转换器如何处理空值 ,当带有特定键的源实体被删除时,某些连接器会生成空值 For example, some change data capture connectors do this when a row is deleted from the source database. 例如,当从源数据库中删除一行时,某些更改数据捕获连接器会执行此操作。 This works great with log compacted topics , since each message represents the last known state of the keyed entity, and because a null value corresponds to a tombstone record telling Kafka that all messages with the same key prior to that tombstone can all be removed from the log. 这对日志压缩主题非常有用 ,因为每个消息都代表键控实体的最后一个已知状态,并且因为空对应于告诉Kafka的逻辑删除记录,该记录告诉所有在该逻辑删除之前具有相同密钥的消息都可以从标记中删除。日志。 But, if configuring the value converter to be a JSON Converter with schemas enabled will never output a null message value, so log compaction never removes the tombstone message. 但是,如果将值转换器配置为启用了模式的JSON转换器则永远不会输出null消息值,因此日志压缩绝不会删除逻辑删除消息。 It's a minor issue, but one to be aware of. 这是一个小问题,但需要注意。

If you want to encode your keys and values in JSON, then chances are you won't need or want the schemas and can thus turn of the schemas.enable for both they key and value JSON converters. 如果要用JSON编码键和值,则可能不需要或不需要这些模式,因此可以将schemas.enable用作键和值JSON转换器。

For those really using schemas, consider using Confluent's Schema Registry and the Avro Converters. 对于真正使用架构的人员,请考虑使用Confluent的Schema Registry和Avro Converters。 Not only are the encoded messages significantly smaller (due to the Avro encoding rather than JSON string encoding), the encoded messages include the ID of the Avro schema and thus allow you to evolve your message schemas over time without having to coordinate upgrading your producers and consumers to use the exact same schemas. 编码后的消息不仅显着变小(由于使用了Avro编码,而不是JSON字符串编码),而且编码后的消息还包含Avro模式的ID,因此,您可以随着时间的推移发展消息模式,而不必协调升级生产者以及消费者使用完全相同的架构。 There are all kinds of advantages! 有各种各样的优点!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM