简体   繁体   中英

Flume: Routing events to the proper topic partition with Kafka channel

In Flume, when using a Kafka channel, is there a way to influence what partition an event is sent to?

With Kafka sink , the key FlumeEvent header is apparently used to choose a partition but I could not find any documentation regarding partitions with the Kafka channel .

The channels do not have to worry about the partition. Because channels is the one writing it and channel is consuming messages, so no need to partition the messages. This is how the message is being created by flume-kafka-channel for writing.

new KeyedMessage<String, byte[]>(topic.get(), null,
              batchUUID, event)

But if your topic has more than one partition then lack of key would result in messages being sprayed into available partitions.

If you want more control on how the messages gets distributed in partition then you might want to look into Kafka's concept of Custom Partitioner, so you can create a class implementing org.apache.kafka.clients.producer.Partitioner interface, and set partitioner.class property with value equal to name of your class and make sure that your custom partitioner is available in your classpath. That way you can get control for every message before publishing and you can decide which partition the message should go to. You could set property kafka.partitioner.class in your flume channel configuration so that it gets picked up

The Kafka channel for Flume does not support mapping an Event header to a partition key out-of-the-box like KafkaSink does.

However, modifying it so that it does is not too complicated. As I am not sure I can share the code, I will just give directions:

  1. add a configuration key for the name of the header which will be mapped to partition key
  2. in inner class KafkaTransaction, replace byte[] in the type of member serializedEvents with something that can also hold a String key for each and every event (either an inner class, or even a Kafka KeyedMessage<String, byte[]> )
  3. in method KafkaTransaction.doPut(Event event) , retrieve key from headers and store in serializedEvents together with serialized message
  4. in method KafkaTransaction.doCommit() , use the key stored with serialized events instead of batchUUID .

NOTE that events in a transaction will no longer be guaranteed to be processed by a single KafkaChannel instance at the consumer end of the channel, so you'll have to check that it is compatible with your use case (regarding transaction size, etc).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM