简体   繁体   English

将 kafka-streams 与自定义分区器一起使用

[英]Using kafka-streams with custom partitioner

I want to join a KStream with a KTable.我想加入带有 KTable 的 KStream。 Both have a different key but are co-partitioned using a custom partitioner.两者都有不同的键,但使用自定义分区器共同分区。 However, the join does not produce and results.但是,连接不会产生和结果。

The KStream has the following structure KStream具有以下结构
- key: House - Group - 关键:房子 - 组
- value: User - 值:用户
The KTable has the following structure KTable具有以下结构
- key: User - Group - 键:用户 - 组
- value: Address - 值:地址

To make sure every insert both topics are processed in insertion order, I'm using a custom Partitioner where I'm partitioning both topics using the Group part of each key.为了确保每个插入的主题都按插入顺序处理,我使用了一个自定义分区器,我使用每个键的 Group 部分对两个主题进行分区。

I want to end up with a stream of the following structure:我想最终得到以下结构的流:
- key: House - Group - 关键:房子 - 组
- value: User - Address - 值:用户 - 地址

For this I'm doing the following:为此,我正在执行以下操作:

val streamsBuilder = streamBuilderHolder.streamsBuilder
val houseToUser = streamsBuilder.stream<HouseGroup, User>("houseToUser")
val userToAddress = streamsBuilder.table<UserGroup, Address>("userToAddress")
val result: KStream<HouseGroup, UserWithAddress> = houseToUser
        .map { k: HouseGroup, v: User ->
            val newKey = UserGroup(v, k.group)
            val newVal = UserHouse(v, k.house)
            KeyValue(newKey, newVal)
        }
        .join(userToAddress) { v1: UserHouse, v2: Address ->
            UserHouseWithAddress(v1, v2)
        }
        .map{k: UserGroup, v: UserHouseWithAddress ->
            val newKey = HouseGroup(v.house, k.group)
            val newVal = UserWithAddress(k.user, v.address)
            KeyValue(newKey, newVal)
        }

This expected a matching join but that did not work.这需要匹配的连接,但这不起作用。

I guess the obvious solution is to join with a global table and let go of the custom partitioner.我想显而易见的解决方案是加入一个全局表并放弃自定义分区器。 However, I still don't understand why the above would not work.但是,我仍然不明白为什么上述方法不起作用。

I think the lack of matching is caused because different partitioners are used.我认为缺少匹配是因为使用了不同的分区器。

For your input topic CustomPartitioner is used.对于您的输入主题,使用CustomPartitioner Kafka Streams be default uses org.apache.kafka.clients.producer.internals.DefaultPartitioner . Kafka Streams 默认使用org.apache.kafka.clients.producer.internals.DefaultPartitioner

In your code just before KStream::join you have called KStream::map .KStream::join之前的代码中,您调用了KStream::map KStream::map function enforced repartitioning before KStream::join . KStream::map函数在KStream::join之前强制重新分区。 During repartioning messages are flushed to Kafka ( $AppName-KSTREAM-MAP-000000000X-repartition topic).$AppName-KSTREAM-MAP-000000000X-repartition期间,消息会刷新到 Kafka( $AppName-KSTREAM-MAP-000000000X-repartition主题)。 To spread messages Kafka Streams uses defined partitioner (property: ProducerConfig.PARTITIONER_CLASS_CONFIG ).为了传播消息,Kafka Streams 使用定义的分区器(属性: ProducerConfig.PARTITIONER_CLASS_CONFIG )。 Summarizing: messages with same keys might be in different partition for " repartition topic " and for " KTable topic "总结:使用相同的密钥信息可能是在“重新分配的话题”不同的分区和“KTable主题

Solution in your case will be set your custom partition in properties for your Kafka Streams application ( props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, "com.example.CustomPartitioner" )您的解决方案将在您的 Kafka Streams 应用程序的属性中设置您的自定义分区( props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, "com.example.CustomPartitioner"

For debugging you can check repartition topic ( $AppName-KSTREAM-MAP-000000000X-repartition ).对于调试,您可以检查重新分区主题( $AppName-KSTREAM-MAP-000000000X-repartition )。 Messages with same keys like input topic might be in different partitions (different number)具有相同键(如输入主题)的消息可能位于不同的分区(不同的编号)

Documentation about Join co-partitioning requirements关于Join 共同分区要求的文档

Try this, works for me.试试这个,对我有用。

static async System.Threading.Tasks.Task Main(string[] args)
        {
 
            int count = 0;
            string line = null;

            var appConfig = getAppConfig(Enviroment.Dev);
            var schemaRegistrConfig = getSchemmaRegistryConfig(appConfig);
            var registry = new CachedSchemaRegistryClient(schemaRegistrConfig);
            var serializer = new AvroSerializer<YourAvroSchemaClass>(registry);

            var adminClient = new AdminClientBuilder(new AdminClientConfig( getClientConfig(appConfig))).Build();
            var topics = new List<TopicSpecification>(){ new TopicSpecification { Name = appConfig.OutputTopic, NumPartitions = 11}};

            await adminClient.CreateTopicsAsync(topics);

            var producerConfig = getProducerConfig(appConfig);

            var producer = new ProducerBuilder<string, byte[]>(producerConfig)
                .SetPartitioner(appConfig.OutputTopic, (string topicName, int partitionCount, ReadOnlySpan<byte> keyData, bool keyIsNull) =>
             {
                 var keyValueInInt = Convert.ToInt32(System.Text.UTF8Encoding.UTF8.GetString(keyData.ToArray()));
                 return (Partition)Math.Floor((double)(keyValueInInt % partitionCount));
             }).Build();

            using (producer)
            {
                Console.WriteLine($"Start to load data from : {appConfig.DataFileName}: { DateTime.Now} ");
                var watch = new Stopwatch();
                watch.Start();
                try
                {
                    var stream = new StreamReader(appConfig.DataFileName);
                    while ((line = stream.ReadLine()) != null)
                    {
                        var message = parseLine(line);
                        var data = await serializer.SerializeAsync(message.Value, new SerializationContext(MessageComponentType.Value, appConfig.OutputTopic));
                        producer.Produce(appConfig.OutputTopic, new Message<string, byte[]> { Key = message.Key, Value = data });
 
                        if (count++ % 1000 == 0)
                        {
                            producer.Flush();
                            Console.WriteLine($"Write ... {count} in {watch.Elapsed.TotalSeconds} seconds");
                        }
                    }
                    producer.Flush();
                }
                catch (ProduceException<Null, string> e)
                {
                    Console.WriteLine($"Line: {line}");
                    Console.WriteLine($"Delivery failed: {e.Error.Reason}");
                    System.Environment.Exit(101);
                }
        finally
        {
            producer.Flush();
           
                }
            }
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM