简体   繁体   English

Flume:使用Kafka频道将事件路由到适当的主题分区

[英]Flume: Routing events to the proper topic partition with Kafka channel

In Flume, when using a Kafka channel, is there a way to influence what partition an event is sent to? 在Flume中,使用Kafka频道时,是否有办法影响事件发送到哪个分区?

With Kafka sink , the key FlumeEvent header is apparently used to choose a partition but I could not find any documentation regarding partitions with the Kafka channel . 对于Kafka接收key FlumeEvent标头显然是用来选择一个分区的,但是我找不到与Kafka 通道有关的分区的任何文档。

The channels do not have to worry about the partition. 通道不必担心分区。 Because channels is the one writing it and channel is consuming messages, so no need to partition the messages. 因为通道是编写它的通道,而通道正在消耗消息,所以无需对消息进行分区。 This is how the message is being created by flume-kafka-channel for writing. 这是通过flume-kafka-channel创建消息以进行写入的方式。

new KeyedMessage<String, byte[]>(topic.get(), null,
              batchUUID, event)

But if your topic has more than one partition then lack of key would result in messages being sprayed into available partitions. 但是,如果您的主题有多个分区,则缺少密钥会导致将消息喷洒到可用分区中。

If you want more control on how the messages gets distributed in partition then you might want to look into Kafka's concept of Custom Partitioner, so you can create a class implementing org.apache.kafka.clients.producer.Partitioner interface, and set partitioner.class property with value equal to name of your class and make sure that your custom partitioner is available in your classpath. 如果您想更多地控制消息在分区中的分布方式,那么您可能需要研究Kafka的Custom Partitioner概念,因此可以创建一个实现org.apache.kafka.clients.producer.Partitioner接口的类,并设置partitioner。 class属性,其值等于类的名称,并确保您的自定义分区程序在类路径中可用。 That way you can get control for every message before publishing and you can decide which partition the message should go to. 这样,您可以在发布之前获得对每条消息的控制权,并可以确定消息应转到哪个分区。 You could set property kafka.partitioner.class in your flume channel configuration so that it gets picked up 您可以在水槽通道配置中设置属性kafka.partitioner.class以便将其拾取

The Kafka channel for Flume does not support mapping an Event header to a partition key out-of-the-box like KafkaSink does. Flume的Kafka通道不支持像KafkaSink那样将Event标头映射到现成的分区键。

However, modifying it so that it does is not too complicated. 但是,对其进行修改以使其不会太复杂。 As I am not sure I can share the code, I will just give directions: 由于我不确定是否可以共享代码,因此我将给出指导:

  1. add a configuration key for the name of the header which will be mapped to partition key 为将映射到分区键的标头名称添加配置键
  2. in inner class KafkaTransaction, replace byte[] in the type of member serializedEvents with something that can also hold a String key for each and every event (either an inner class, or even a Kafka KeyedMessage<String, byte[]> ) 在内部类KafkaTransaction中,将成员serializedEvents类型的byte[]替换为还可以为每个事件保留一个String键的东西(内部类,甚至是Kafka KeyedMessage<String, byte[]>
  3. in method KafkaTransaction.doPut(Event event) , retrieve key from headers and store in serializedEvents together with serialized message 在方法KafkaTransaction.doPut(Event event) ,从标头中检索密钥并将其与序列化消息一起存储在serializedEvents
  4. in method KafkaTransaction.doCommit() , use the key stored with serialized events instead of batchUUID . 在方法KafkaTransaction.doCommit() ,使用与序列化事件一起存储的密钥而不是batchUUID

NOTE that events in a transaction will no longer be guaranteed to be processed by a single KafkaChannel instance at the consumer end of the channel, so you'll have to check that it is compatible with your use case (regarding transaction size, etc). 注意 ,事务中的事件将不再保证由通道的使用者端的单个KafkaChannel实例处理,因此您必须检查它是否与用例兼容(关于事务大小等)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM