简体   繁体   English

Kafka Connect使用(json)消息中的字段将主题持久化到Elasticsearch索引

[英]Kafka Connect to persist topic to Elasticsearch index using field from (json) message

I'm attempting to index messages in Elasticsearch using SMT's from Kafka's Connect API only. 我试图仅使用来自Kafka的Connect API的SMT在Elasticsearch中为消息编制索引。

So far I had luck with simply using the topic and timestamp router functionality. 到目前为止,我仅使用主题和时间戳路由器功能就很幸运。 However, now I'd like to create separate indices based on a certain field in the message. 但是,现在我想根据消息中的某个字段创建单独的索引。

Suppose the messages are formatted as such: 假设消息的格式如下:

{"productId": 1, "category": "boat", "price": 135000}
{"productId": 1, "category": "helicopter", "price": 300000}
{"productId": 1, "category": "car", "price": 25000}

Would it somehow be possible to index these to the following indices based on product category? 是否可以基于产品类别将它们索引到以下索引?

  • product-boat 产品船
  • product-helicopter 产品直升机
  • product-car 产品车

or would I have to create separate topics for every single category (knowing that it could become hundreds or thousands of them)? 还是我必须为每个类别创建单独的主题(知道它可能会变成成百上千个)?

Am I overseeing a transform that could do this or is this simply not possible and will a custom component have to be built? 我是否正在监督可以执行此操作的转换,或者这根本不可能,是否需要构建自定义组件?

There's nothing out of the box with Kafka Connect that will do this. Kafka Connect并没有开箱即用的功能。 You have a few options: 您有几种选择:

  1. The Elasticsearch sink connector will route messages to a target index based on its topic, so you could write a custom SMT that would inspect a message and route it to a different topic accordingly Elasticsearch接收器连接器将根据其主题将消息路由到目标索引,因此您可以编写一个自定义SMT来检查消息并将其相应地路由到其他主题
  2. Use a stream processor to pre-process the messages such that they're already on different topics by the time they are consumed by the Elasticsearch sink connector. 使用流处理器对消息进行预处理,以使它们在被Elasticsearch接收器连接器消耗时已经在不同的主题上。 For example, Kafka Streams or KSQL. 例如,Kafka Streams或KSQL。
    • KSQL you would need to hard code each category ( CREATE STREAM product-boat AS SELECT * FROM messages WHERE category='boat' etc) KSQL您需要对每个类别进行硬编码( CREATE STREAM product-boat AS SELECT * FROM messages WHERE category='boat'等)
    • Kafka Streams now has Dynamic Routing ( KIP-303 ) which would be a more flexible way of doing it Kafka Streams现在具有动态路由( KIP-303 ),这将是一种更灵活的方式
  3. Handcode a bespoke Elasticsearch sink connector with the logic coded in to route the messages to indices based on message contents. 使用已编码的逻辑手工编码定制的Elasticsearch接收器连接器,以根据消息内容将消息路由到索引。 This feels like the worst of the three approach IMO. 这感觉就像是IMO三种方法中最差的一种。

If you are using Confluent Platform you can do some kind of routing depends on field value in the message. 如果您使用的是Confluent Platform ,则可以根据消息中的字段值进行某种路由。

To do that you have to use ExtractTopic SMT from Confluent. 为此,您必须使用Confluent的ExtractTopic SMT。 More details regarding that SMT can be found at https://docs.confluent.io/current/connect/transforms/extracttopic.html#extracttopic 有关SMT的更多详细信息,请参见https://docs.confluent.io/current/connect/transforms/extracttopic.html#extracttopic

Kafka Sink Connector processes messages, that are represented by SinkRecord . Kafka Sink连接器处理由SinkRecord表示的SinkRecord Each SinkRecord contains of several fields: topic , partition , value , key , etc. Those fields are set by Kafka Connect and using transformation you can change those value. 每个SinkRecord包含几个字段: topicpartitionvaluekey等等。这些字段由Kafka Connect设置,使用转换可以更改这些值。 ExtractTopic SMT changes value of topic based on value or key of the message. ExtractTopic SMT根据消息的valuekey来更改topicvalue

Transformations configuration will be something like that: 转换配置将如下所示:

{
...
    "transforms": "ExtractTopic",
    "transforms.ExtractTopic.type": "io.confluent.connect.transforms.ExtractTopic$Value",
    "transforms.ExtractTopic.field": "name",  <-- name of field, that value will be used as index name
...
}

One limitation is, that you have to create indices in advance. 一种限制是,您必须提前创建索引。

How I assume you are using Elasticsearch Sink Connector . 我如何假设您正在使用Elasticsearch Sink Connector Elasticsearch connector has ability to create index, but it does it when its opens - method to create writers for particular partition ( ElasticsearchSinkTask::open ). Elasticsearch连接器具有创建索引的能力,但是它在打开索引时就可以创建索引-为特定分区创建写程序的方法( ElasticsearchSinkTask::open )。 In your use case at that moment all indices can't be created, because value of all messages are not available. 那时候在您的用例中,由于所有消息的值都不可用,因此无法创建所有索引。

Maybe it isn't the purest approach, because ExtractTopic should be rather used for Source connectors, but in you case it might work. 也许这不是最纯粹的方法,因为ExtractTopic应该用于Source连接器,但是在您这种情况下,它可能会起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尝试使用Kafka Connect在Elasticsearch中为Kafka主题编制索引 - Trying to index kafka topic in Elasticsearch with Kafka Connect Kafka-connect elasticsearch用于索引的自动小写主题名称 - Kafka-connect elasticsearch auto-lowercase topic name for for index 如何使用logstash将Kafka主题键值索引为字段? - How to index Kafka topic key value as a field using logstash? 使用logstash消费kafka主题到elasticSearch - Consuming a kafka topic using logstash to elasticSearch 从Kafka流到Elasticsearch时的主题映射 - Topic mapping when streaming from Kafka to Elasticsearch 无法将数据从 kafka 主题发送到 elasticsearch - Unable to send data from kafka topic to elasticsearch Kafka connect elasticsearch sink 从 JSON 中提取和执行值 - Kafka connect elasticsearch sink extract and perform values from JSON kafka-connect-elasticsearch:当使用“write.method”作为 upsert 时,是否可以在 kafka 主题上使用相同的 AVRO object 来发送部分文档? - kafka-connect-elasticsearch: When using “write.method” as upsert, is it possible to use same AVRO object on kafka topic to send partial document? 使用 Logstash 过滤器操作来自 Kafka 主题的 JSON 消息 - Manipulating JSON messages from Kafka topic using Logstash filter Kafka Connect Elasticsearch Connector的消息顺序 - Message order with Kafka Connect Elasticsearch Connector
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM