[英]Kafka connect ElasticSearch sink - using if-else blocks to extract and transform fields for different topics
I have a kafka es sink properties file like the following 我有一个像下面的kafka es接收器属性文件
name=elasticsearch.sink.direct
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=16
topics=data.my_setting
connection.url=http://dev-elastic-search01:9200
type.name=logs
topic.index.map=data.my_setting:direct_my_setting_index
batch.size=2048
max.buffered.records=32768
flush.timeout.ms=60000
max.retries=10
retry.backoff.ms=1000
schema.ignore=true
transforms=InsertKey,ExtractId
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=MY_SETTING_ID
transforms.ExtractId.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractId.field=MY_SETTING_ID
This works perfectly for a single topic (data.my_setting). 这非常适合单个主题(data.my_setting)。 I would like to use the same connector for data coming in from more than one topic.
我想对来自多个主题的数据使用相同的连接器。 A message in a different topic will have a different key which I'll need to transform.I was wondering if there's a way to use if else statements with a condition on the topic name or on a single field in the message such that I can then transform the key differently.
我想知道是否有一种方法可以使用else语句是否以主题名称或消息中的单个条件为条件,因此在不同主题中的消息将具有不同的键,我需要转换它。然后对密钥进行不同的转换。 All the incoming messages are json with schema and payload.
所有传入的消息都是带有模式和有效负载的json。
UPDATE based on the answer : 根据答案更新 :
In my jdbc connector I add the key as follows: 在我的jdbc连接器中,添加密钥,如下所示:
name=data.my_setting
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
poll.interval.ms=500
tasks.max=4
mode=timestamp
query=SELECT * FROM MY_TABLE with (nolock)
timestamp.column.name=LAST_MOD_DATE
topic.prefix=investment.ed.data.app_setting
transforms=ValueToKey
transforms.ValueToKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.ValueToKey.fields=MY_SETTING_ID
I still however get the error when a message produced from this connector is read by elasticsearch sink 但是,当Elasticsearch接收器读取此连接器产生的消息时,我仍然收到错误
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
Caused by: org.apache.kafka.connect.errors.DataException: STRUCT is not supported as the document id
The payload looks like this: 有效负载如下所示:
{
"schema": {
"type": "struct",
"fields": [{
"type": "int32",
"optional": false,
"field": "MY_SETTING_ID"
}, {
"type": "string",
"optional": true,
"field": "MY_SETTING_NAME"
}
],
"optional": false
},
"payload": {
"MY_SETTING_ID": 9,
"MY_SETTING_NAME": "setting_name"
}
}
Connect standalone property file looks like this: Connect独立属性文件如下所示:
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/apps/{env}/logs/infrastructure/offsets/connect.offsets
rest.port=8084
plugin.path=/usr/share/java
Is there a way to achieve my goal which is to have messages from multiple topics (in my case db tables) which will have their own unique ids (which will also be the id of a document in ES) be sent to a single ES sink. 有没有一种方法可以实现我的目标,那就是将来自多个主题(在我的情况下是db表)的消息发送给单个ES接收器,这些消息将具有自己的唯一ID(也将是ES中的文档的ID) 。
Can I use avro for this task. 我可以将avro用于此任务吗? Is there a way to define the key in schema registry or will I run into the same problem?
有没有办法在架构注册表中定义密钥,还是会遇到相同的问题?
This isn't possible. 这是不可能的。 You'd need multiple Connectors if the key fields are different.
如果关键字段不同,则需要多个连接器。
One option to think about is pre-processing your Kafka topics through a stream processor (eg Kafka Streams, KSQL, Spark Streaming etc etc) to standardise the key fields, so that you can then use a single connector. 要考虑的一种选择是通过流处理器(例如,Kafka Streams,KSQL,Spark Streaming等)预处理您的Kafka主题,以标准化关键字段,以便您可以使用单个连接器。 It depends what you're building as to whether this would be worth doing, or overkill.
这取决于您要构建的内容是值得进行还是过度使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.