[英]Kafka Connect to persist topic to Elasticsearch index using field from (json) message
I'm attempting to index messages in Elasticsearch using SMT's from Kafka's Connect API only. 我试图仅使用来自Kafka的Connect API的SMT在Elasticsearch中为消息编制索引。
So far I had luck with simply using the topic and timestamp router functionality. 到目前为止,我仅使用主题和时间戳路由器功能就很幸运。 However, now I'd like to create separate indices based on a certain field in the message. 但是,现在我想根据消息中的某个字段创建单独的索引。
Suppose the messages are formatted as such: 假设消息的格式如下:
{"productId": 1, "category": "boat", "price": 135000}
{"productId": 1, "category": "helicopter", "price": 300000}
{"productId": 1, "category": "car", "price": 25000}
Would it somehow be possible to index these to the following indices based on product category? 是否可以基于产品类别将它们索引到以下索引?
or would I have to create separate topics for every single category (knowing that it could become hundreds or thousands of them)? 还是我必须为每个类别创建单独的主题(知道它可能会变成成百上千个)?
Am I overseeing a transform that could do this or is this simply not possible and will a custom component have to be built? 我是否正在监督可以执行此操作的转换,或者这根本不可能,是否需要构建自定义组件?
There's nothing out of the box with Kafka Connect that will do this. Kafka Connect并没有开箱即用的功能。 You have a few options: 您有几种选择:
CREATE STREAM product-boat AS SELECT * FROM messages WHERE category='boat'
etc) KSQL您需要对每个类别进行硬编码( CREATE STREAM product-boat AS SELECT * FROM messages WHERE category='boat'
等) If you are using Confluent Platform
you can do some kind of routing depends on field value in the message. 如果您使用的是Confluent Platform
,则可以根据消息中的字段值进行某种路由。
To do that you have to use ExtractTopic
SMT from Confluent. 为此,您必须使用Confluent的ExtractTopic
SMT。 More details regarding that SMT can be found at https://docs.confluent.io/current/connect/transforms/extracttopic.html#extracttopic 有关SMT的更多详细信息,请参见https://docs.confluent.io/current/connect/transforms/extracttopic.html#extracttopic
Kafka Sink Connector processes messages, that are represented by SinkRecord
. Kafka Sink连接器处理由SinkRecord
表示的SinkRecord
。 Each SinkRecord
contains of several fields: topic
, partition
, value
, key
, etc. Those fields are set by Kafka Connect and using transformation you can change those value. 每个SinkRecord
包含几个字段: topic
, partition
, value
, key
等等。这些字段由Kafka Connect设置,使用转换可以更改这些值。 ExtractTopic
SMT changes value of topic
based on value
or key
of the message. ExtractTopic
SMT根据消息的value
或key
来更改topic
的value
。
Transformations configuration will be something like that: 转换配置将如下所示:
{
...
"transforms": "ExtractTopic",
"transforms.ExtractTopic.type": "io.confluent.connect.transforms.ExtractTopic$Value",
"transforms.ExtractTopic.field": "name", <-- name of field, that value will be used as index name
...
}
One limitation is, that you have to create indices in advance. 一种限制是,您必须提前创建索引。
How I assume you are using Elasticsearch Sink Connector . 我如何假设您正在使用Elasticsearch Sink Connector 。 Elasticsearch connector has ability to create index, but it does it when its opens - method to create writers for particular partition ( ElasticsearchSinkTask::open
). Elasticsearch连接器具有创建索引的能力,但是它在打开索引时就可以创建索引-为特定分区创建写程序的方法( ElasticsearchSinkTask::open
)。 In your use case at that moment all indices can't be created, because value of all messages are not available. 那时候在您的用例中,由于所有消息的值都不可用,因此无法创建所有索引。
Maybe it isn't the purest approach, because ExtractTopic
should be rather used for Source connectors, but in you case it might work. 也许这不是最纯粹的方法,因为ExtractTopic
应该用于Source连接器,但是在您这种情况下,它可能会起作用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.