简体   繁体   English

Kafka connect elasticsearch sink 从 JSON 中提取和执行值

[英]Kafka connect elasticsearch sink extract and perform values from JSON

I use Elasticsearch Sink connector for stream data from kafka to elasticsearch, and I have next question.我将 Elasticsearch Sink 连接器用于从 kafka 到 elasticsearch 的流数据,我有下一个问题。

I have next structure in kafka topic document我在 kafka 主题document中有下一个结构

Partition : 0 
Offset: 0
Key: 
Value: 
{
  "attributes": {
    "3": "Mike"
  }
}
Timestamp: 2022-11-03 19:03:34.866

For this data I have next index template in my elasticsearch对于这些数据,我的 elasticsearch 中有下一个索引模板

{
  "version": 1,
  "index_patterns": [
    "documents-*"
  ],
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "cashier": {
        "type": "text"
      }
    }
  }
}

And I have next configuration Elastcisearch Sink Connector我有下一个配置 Elastcisearch Sink Connector

{
  "name": "elasticsearch-sink",
  "config": {
    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
    "tasks.max": "1",
    "topics": "document, document-processing-error",
    "key.ignore": "true",
    "schema.ignore": "true",
    "connection.url": "http://elasticsearch:9200",
    "type.name": "_doc",
    "name": "elasticsearch-sink",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": "false",
    "flush.synchronously": "true",

    "transforms": "appendTimestampToIX",
    "transforms.appendTimestampToIX.type": "org.apache.kafka.connect.transforms.TimestampRouter",
    "transforms.appendTimestampToIX.topic.format": "${topic}-${timestamp}",
    "transforms.appendTimestampToIX.timestamp.format": "yyyy-MM-dd"
  }
}

In the output I have next data in my index document-2022-11-03在输出中,我的索引文档中有下一个数据document-2022-11-03

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "document-2022-11-03",
                "_type": "_doc",
                "_id": "document-2022-11-03+0+0",
                "_score": 1.0,
                "_source": {
                    "attributes": {
                        "3": "Mike"
                    }
                }
            }
        ]
    }
}

This works fine, but I need extra transformation for my data, for example if in attribute I have key 3 , I need to replace this field and add key cashier and mutate this structure to flat JSON with random id for document, so, in the end output I need next structure (for example)这工作正常,但我需要对我的数据进行额外的转换,例如,如果在属性中我有键3 ,我需要替换这个字段并添加键cashier并将这个结构变异为平面 JSON 和文档的随机 id,所以,在结束输出我需要下一个结构(例如)

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "document-2022-11-03",
                "_type": "_doc",
                "_id": "134DaBfWAE6AZUyKUAbjRksjXHTmP6hDxedGm4YhBnZW",
                "_score": 1.0,
                "_source": {
                      "cashier": "Mike"
                }
            }
        ]
    }
}

I tired use next config for replace field but this doesn't work for me我厌倦了使用下一个配置替换字段,但这对我不起作用

"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "arrtubites.3:cashier"

How can I do this?我怎样才能做到这一点?

ReplaceField transform does not work with nested attributes such as Maps or Objects, only top-level fields of either. ReplaceField转换不适用于映射或对象等嵌套属性,仅适用于其中任何一个的顶级字段。

If you want to convert如果你想转换

{
  "attributes": {
    "3": "Mike"
  }
}

Into进入

{
  "cashier": "Mike"
}

Then, Kafka Streams or ksqlDB are the common recommendations (aka consume elsewhere, and produce to a new topic with the logic that you want to perform).然后,Kafka Streams 或 ksqlDB 是常见的建议(也就是在别处消费,并根据您要执行的逻辑生成新主题)。

Logstash may also be an option instead of that + Kafka Connect. Logstash 也可能是一个选项,而不是 + Kafka Connect。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka Connect Elasticsearch sink 没有索引文档 - Kafka Connect Elasticsearch sink no documents are indexed 如何激活和配置ElasticSearch Kafka Connect接收器? - How to activate and configure ElasticSearch Kafka Connect sink? 自定义Kafka Connect-ElasticSearch接收器连接器 - Customize Kafka Connect - ElasticSearch Sink Connector Kafka Connect Elasticsearch 带有自定义路由的接收器连接器 - Kafka Connect Elasticsearch Sink Connector with custom _routing Kafka连接ElasticSearch接收器-使用if-else块提取和转换不同主题的字段 - Kafka connect ElasticSearch sink - using if-else blocks to extract and transform fields for different topics 在 kafka-connect 接收器中提取字段和解析 JSON - ExtractField and Parse JSON in kafka-connect sink 无法使用 elasticsearch sink 连接器(kafka-connect) - Unable to use elasticsearch sink connector (kafka-connect) 用kafka接收器在elasticsearch中重命名索引 - rename index in elasticsearch with kafka sink Elasticsearch接收器仅使用kafka-connect-elasticsearch +时间戳SMT仅获得新消息,而不接收前一条消息 - Elasticsearch sink only get new messages and not the previous one using kafka-connect-elasticsearch + timestamp SMT kafka 连接弹性接收器无法连接到 Elasticsearch。 一般 SSLEngine 问题 - kafka connect elastic sink Could not connect to Elasticsearch. General SSLEngine problem
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM