简体   繁体   English

Kafka连接ElasticSearch接收器-使用if-else块提取和转换不同主题的字段

[英]Kafka connect ElasticSearch sink - using if-else blocks to extract and transform fields for different topics

I have a kafka es sink properties file like the following 我有一个像下面的kafka es接收器属性文件

name=elasticsearch.sink.direct
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=16
topics=data.my_setting

connection.url=http://dev-elastic-search01:9200
type.name=logs
topic.index.map=data.my_setting:direct_my_setting_index
batch.size=2048
max.buffered.records=32768
flush.timeout.ms=60000
max.retries=10
retry.backoff.ms=1000
schema.ignore=true
transforms=InsertKey,ExtractId
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=MY_SETTING_ID
transforms.ExtractId.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractId.field=MY_SETTING_ID

This works perfectly for a single topic (data.my_setting). 这非常适合单个主题(data.my_setting)。 I would like to use the same connector for data coming in from more than one topic. 我想对来自多个主题的数据使用相同的连接器。 A message in a different topic will have a different key which I'll need to transform.I was wondering if there's a way to use if else statements with a condition on the topic name or on a single field in the message such that I can then transform the key differently. 我想知道是否有一种方法可以使用else语句是否以主题名称或消息中的单个条件为条件,因此在不同主题中的消息将具有不同的键,我需要转换它。然后对密钥进行不同的转换。 All the incoming messages are json with schema and payload. 所有传入的消息都是带有模式和有效负载的json。

UPDATE based on the answer : 根据答案更新

In my jdbc connector I add the key as follows: 在我的jdbc连接器中,添加密钥,如下所示:

name=data.my_setting
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
poll.interval.ms=500
tasks.max=4
mode=timestamp
query=SELECT * FROM MY_TABLE with (nolock)
timestamp.column.name=LAST_MOD_DATE
topic.prefix=investment.ed.data.app_setting

transforms=ValueToKey
transforms.ValueToKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.ValueToKey.fields=MY_SETTING_ID

I still however get the error when a message produced from this connector is read by elasticsearch sink 但是,当Elasticsearch接收器读取此连接器产生的消息时,我仍然收到错误

org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
Caused by: org.apache.kafka.connect.errors.DataException: STRUCT is not supported as the document id

The payload looks like this: 有效负载如下所示:

{
"schema": {
    "type": "struct",
    "fields": [{
            "type": "int32",
            "optional": false,
            "field": "MY_SETTING_ID"
        }, {
            "type": "string",
            "optional": true,
            "field": "MY_SETTING_NAME"
        }
    ],
    "optional": false
},
"payload": {
    "MY_SETTING_ID": 9,
    "MY_SETTING_NAME": "setting_name"
}
}

Connect standalone property file looks like this: Connect独立属性文件如下所示:

bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter 
value.converter=org.apache.kafka.connect.json.JsonConverter 
converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter 
internal.value.converter=org.apache.kafka.connect.json.JsonConverter 
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/apps/{env}/logs/infrastructure/offsets/connect.offsets
rest.port=8084
plugin.path=/usr/share/java

Is there a way to achieve my goal which is to have messages from multiple topics (in my case db tables) which will have their own unique ids (which will also be the id of a document in ES) be sent to a single ES sink. 有没有一种方法可以实现我的目标,那就是将来自多个主题(在我的情况下是db表)的消息发送给单个ES接收器,这些消息将具有自己的唯一ID(也将是ES中的文档的ID) 。

Can I use avro for this task. 我可以将avro用于此任务吗? Is there a way to define the key in schema registry or will I run into the same problem? 有没有办法在架构注册表中定义密钥,还是会遇到相同的问题?

This isn't possible. 这是不可能的。 You'd need multiple Connectors if the key fields are different. 如果关键字段不同,则需要多个连接器。

One option to think about is pre-processing your Kafka topics through a stream processor (eg Kafka Streams, KSQL, Spark Streaming etc etc) to standardise the key fields, so that you can then use a single connector. 要考虑的一种选择是通过流处理器(例如,Kafka Streams,KSQL,Spark Streaming等)预处理您的Kafka主题,以标准化关键字段,以便您可以使用单个连接器。 It depends what you're building as to whether this would be worth doing, or overkill. 这取决于您要构建的内容是值得进行还是过度使用。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka connect elasticsearch sink 从 JSON 中提取和执行值 - Kafka connect elasticsearch sink extract and perform values from JSON 如何使用if-else条件进行弹性搜索,根据输出部分中logstash conf中的名称过滤kafka主题 - how to filter kafka topics based on their names in logstash conf in the output section using if-else condition for elastic search Kafka Connect Elasticsearch sink 没有索引文档 - Kafka Connect Elasticsearch sink no documents are indexed 如何激活和配置ElasticSearch Kafka Connect接收器? - How to activate and configure ElasticSearch Kafka Connect sink? Kafka Connect Elasticsearch 带有自定义路由的接收器连接器 - Kafka Connect Elasticsearch Sink Connector with custom _routing 自定义Kafka Connect-ElasticSearch接收器连接器 - Customize Kafka Connect - ElasticSearch Sink Connector Elasticsearch接收器仅使用kafka-connect-elasticsearch +时间戳SMT仅获得新消息,而不接收前一条消息 - Elasticsearch sink only get new messages and not the previous one using kafka-connect-elasticsearch + timestamp SMT 使用 fast-data-dev 使用 docker 将 Kafka 主题连接到 elasticsearch - Connect Kafka topics to elasticsearch using fast-data-dev using docker 使用 Elasticsearch Sink Connector for Kafka 从两个由下划线分隔的值字段创建文档 ID - Create document id from two value fields separated by underscore using Elasticsearch Sink Connector for Kafka 无法使用 elasticsearch sink 连接器(kafka-connect) - Unable to use elasticsearch sink connector (kafka-connect)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM