简体繁体 English

自定义Kafka Connect-ElasticSearch接收器连接器

[英]Customize Kafka Connect - ElasticSearch Sink Connector

原文 2019-09-04 08:29:43 7 1 elasticsearch/ apache-kafka/ apache-kafka-connect

I've Kafka topic with multiple type of messages flowing in and writing to Elastic Search using Kafka Connect. 我遇到了Kafka主题，其中有多种类型的消息流入并使用Kafka Connect写入Elastic Search。 Streaming looks good until, I've to separate unique set of messages into unique index. 在我必须将一组唯一的消息分成唯一的索引之前，流看起来不错。 Ie I've to get the new index for the new set of data based on the fields (are JSON messages). 即我必须基于字段（JSON消息）获取新数据集的新索引。

How do I configure/customize the Kafka connect to do the same for me? 如何配置/自定义Kafka连接以为我做同样的事情？ Each message contains a filed represents the type of message and the timestamp. 每个消息都包含一个表示消息类型和时间戳的字段。

The sample Json looks like: Sample1: {"log":{"data":"information", "version":"1.1"}, "type":"xyz", "timestamp":"2019-08-28t10:07:40.370z", "value":{}} , 示例Json看起来像：Sample1： {"log":{"data":"information", "version":"1.1"}, "type":"xyz", "timestamp":"2019-08-28t10:07:40.370z", "value":{}} ，

Sample2: {"log":{"data":"information", "version":"1.1", "value":{}}, "type":"abc", "timestamp":"2019-08-28t10:07:40.370z" } Sample2： {"log":{"data":"information", "version":"1.1", "value":{}}, "type":"abc", "timestamp":"2019-08-28t10:07:40.370z" }

I would like to customize/configure the Kafka connect ES sink to write Sample1 doc to index 'xyz.20190828' and Sample2 doc to index 'abc.20190828'. 我想自定义/配置Kafka connect ES接收器，以将Sample1文档写入索引“ xyz.20190828”，并将Sample2文档写入索引“ abc.20190828”。

I'm using Kafka-2.2.0, and confluentinc-kafka-connect-elasticsearch-5.2.1 plugin. 我正在使用Kafka-2.2.0和confluentinc-kafka-connect-elasticsearch-5.2.1插件。

Appreciate the help. 感谢帮助。

1 个解决方案

You could do this using a custom Single Message Transform (SMT) which you would need to write yourself . 您可以使用自定义的单一消息转换（SMT）来完成此操作，您需要自己编写。 By changing the topic of a message based on its contents you will route it to a different Elasticsearch index. 通过根据消息的内容更改消息的主题，可以将其路由到其他Elasticsearch索引。

Currently Apache Kafka ships with a SMT which can rename entire topics ( RegExRouter ) or add timestamps ( TimestampRouter ). 目前，Apache Kafka附带有一个SMT，可以重命名整个主题（ RegExRouter ）或添加时间戳（ TimestampRouter ）。 You may find these a useful starting point for writing your own. 您可能会发现这些是编写自己的有用的起点。

The alternative is as @wardzniak suggests in his comment—use stream processing (eg Kafka Streams, KSQL) to pre-process the source topic before using Kafka Connect to send the resulting separate topics to Elasticsearch. 备选方案如@wardzniak在他的评论中建议的那样-在使用Kafka Connect将生成的单独主题发送到Elasticsearch之前，使用流处理（例如Kafka Streams，KSQL）对源主题进行预处理。