[英]Kafka connect elasticsearch sink extract and perform values from JSON
I use Elasticsearch Sink connector for stream data from kafka to elasticsearch, and I have next question.我将 Elasticsearch Sink 连接器用于从 kafka 到 elasticsearch 的流数据,我有下一个问题。
I have next structure in kafka topic document
我在 kafka 主题document
中有下一个结构
Partition : 0
Offset: 0
Key:
Value:
{
"attributes": {
"3": "Mike"
}
}
Timestamp: 2022-11-03 19:03:34.866
For this data I have next index template in my elasticsearch对于这些数据,我的 elasticsearch 中有下一个索引模板
{
"version": 1,
"index_patterns": [
"documents-*"
],
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"cashier": {
"type": "text"
}
}
}
}
And I have next configuration Elastcisearch Sink Connector我有下一个配置 Elastcisearch Sink Connector
{
"name": "elasticsearch-sink",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "document, document-processing-error",
"key.ignore": "true",
"schema.ignore": "true",
"connection.url": "http://elasticsearch:9200",
"type.name": "_doc",
"name": "elasticsearch-sink",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"flush.synchronously": "true",
"transforms": "appendTimestampToIX",
"transforms.appendTimestampToIX.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.appendTimestampToIX.topic.format": "${topic}-${timestamp}",
"transforms.appendTimestampToIX.timestamp.format": "yyyy-MM-dd"
}
}
In the output I have next data in my index document-2022-11-03
在输出中,我的索引文档中有下一个数据document-2022-11-03
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "document-2022-11-03",
"_type": "_doc",
"_id": "document-2022-11-03+0+0",
"_score": 1.0,
"_source": {
"attributes": {
"3": "Mike"
}
}
}
]
}
}
This works fine, but I need extra transformation for my data, for example if in attribute I have key 3
, I need to replace this field and add key cashier
and mutate this structure to flat JSON with random id for document, so, in the end output I need next structure (for example)这工作正常,但我需要对我的数据进行额外的转换,例如,如果在属性中我有键3
,我需要替换这个字段并添加键cashier
并将这个结构变异为平面 JSON 和文档的随机 id,所以,在结束输出我需要下一个结构(例如)
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "document-2022-11-03",
"_type": "_doc",
"_id": "134DaBfWAE6AZUyKUAbjRksjXHTmP6hDxedGm4YhBnZW",
"_score": 1.0,
"_source": {
"cashier": "Mike"
}
}
]
}
}
I tired use next config for replace field but this doesn't work for me我厌倦了使用下一个配置替换字段,但这对我不起作用
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "arrtubites.3:cashier"
How can I do this?我怎样才能做到这一点?
ReplaceField
transform does not work with nested attributes such as Maps or Objects, only top-level fields of either. ReplaceField
转换不适用于映射或对象等嵌套属性,仅适用于其中任何一个的顶级字段。
If you want to convert如果你想转换
{
"attributes": {
"3": "Mike"
}
}
Into进入
{
"cashier": "Mike"
}
Then, Kafka Streams or ksqlDB are the common recommendations (aka consume elsewhere, and produce to a new topic with the logic that you want to perform).然后,Kafka Streams 或 ksqlDB 是常见的建议(也就是在别处消费,并根据您要执行的逻辑生成新主题)。
Logstash may also be an option instead of that + Kafka Connect. Logstash 也可能是一个选项,而不是 + Kafka Connect。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.