简体   繁体   English

kafka-connect-elasticsearch:如何根据 Kafka 主题的 header 中的某个值删除文档

[英]kafka-connect-elasticsearch: How to delete document based on certain value in header of the Kafka topic

I am getting up to speed here with Kafka Connect.我正在使用 Kafka Connect 加快速度。 Trying to use Kafka Connect Elasticsearch Service sink connector to move our data from Kafka to Elasticsearch.尝试使用 Kafka Connect Elasticsearch 服务接收器连接器将我们的数据从 Kafka 移动到 Elasticsearch。 I have a processing stream that looks like this:我有一个处理 stream 看起来像这样:

File record from s3->custom processing from source application which publishes to ->kafka topic->Kafka connect->Elastic Search来自 s3 的文件记录->来自发布到的源应用程序的自定义处理->kafka 主题->Kafka 连接->Elastic Search

This works for the scenario of create/update.这适用于创建/更新的场景。 However we want to handle the delete scenario for the file.但是,我们要处理文件的删除方案。 Our application publishes an event for delete action and sets that as part of the header value in the Kafka message.我们的应用程序发布删除操作的事件并将其设置为 Kafka 消息中 header 值的一部分。 Instead of updating the document in elastic with this delete action info, we would like to delete the document itself.我们希望删除文档本身,而不是使用此删除操作信息更新弹性文档。

How we can achieve this using Kafka Connect to read this header value and issue a delete of document for the given key from the Elastic?我们如何使用 Kafka Connect 来读取这个 header 值并从 Elastic 中删除给定键的文档?

Thanks for your help in advance.提前感谢您的帮助。

Regards, Vikas问候, 维卡斯

EDITED: Example of message I am trying to transform:已编辑:我要转换的消息示例:

[{
    "key": "fileid=05ffefea-a71d-4bb7-091e-08d8f9229806",
    "rownum": 0,
    "metadata": {
        "offset": 1468950,
        "partition": 3,
        "timestamp": 1617773161088,
        "__keysize": 43,
        "__valsize": 596
    },
    "headers": {
        "sub-tenant-id": "",
        "actiondate": "2021-04-07T05:26:01.0790010Z",
        "action": "uploaded",
        "contentversion": "V1"
    },
    "value": {
        "id": "fil.05ffefeaa71d4bb7091e08d8f9229806",
        "name": "4.txt",
        "volumeId": "vol.e25196dc9e2f460bb27308d8f8405691",
        "volumeName": "projdmck0405",
        "type": "text/plain",
        "subTenantId": "",
        "path": "/4.txt",
        "timeCreated": "2021-04-07T05:25:46.129Z",
        "timeModified": "2021-04-07T05:25:46.129Z",
        "urn": "urn:mycompany:product:test:app:file:fil.05ffefeaa71d4bb7091e08d8f9229806#/4.txt",
        "sizeInBytes": 76,
        "isUploaded": true,
        "archiveStatus": "None",
        "storageTier": "Standard",
        "eTag": "11fb9ec5531d90d571b331cc39e43175"
    }
}]

I am trying to add the action header field and value to the body of the message.我正在尝试将action header fieldvalue添加到消息正文中。

Here is the transform I used using the example: https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-transform-common/transformations/examples/HeaderToField.headertofield.html Here is the transform I used using the example: https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-transform-common/transformations/examples/HeaderToField.headertofield.html

 "transforms"                            : "dropNullRecords,headerToField",

 "transforms.headerToField.type"             : "com.github.jcustenborder.kafka.connect.transform.common.HeaderToField$Value",
      "transforms.headerToField.header.mappings"  : "action:STRING:actioninbody"

I did try this with mappings value of "action:STRING" just following the example then I noticed the format mentioned as:我确实按照示例使用“action:STRING”的映射值尝试了这个,然后我注意到提到的格式为:

The format is <header name>:<header type>[:field name]. 

What I am missing?我错过了什么?

I was able to achieve this.我能够做到这一点。 Ended up writing a custom SMT.最终编写了一个自定义 SMT。 Using the SMT I had access to the connect record including the header and the value.使用 SMT,我可以访问连接记录,包括 header 和值。 So I just read the header values on by one and when encountered the value I was interested in, I set the connect record's value to null.所以我只是将 header 值读了一个,当遇到我感兴趣的值时,我将连接记录的值设置为 null。 In addition to this Kafka Connect also exposes below parameter:除了这个 Kafka Connect 还公开了以下参数:

behavior.on.null.values
How to handle records with a non-null key and a null value (for example, Kafka tombstone records). Valid options are IGNORE, DELETE, and FAIL.

Type: string
Default: FAIL
Valid Values: (case insensitive) [DELETE, IGNORE, FAIL]
Importance: low

I set the value to DELETE and it started deleted the record from ES index.我将值设置为 DELETE 并开始从 ES 索引中删除记录。

I followed this example of custom SMT from Confluent: https://github.com/confluentinc/kafka-connect-insert-uuid我遵循了来自 Confluent 的自定义 SMT 示例: https://github.com/confluentinc/kafka-connect-insert-uuid

Helped a lot to understand the concept and connect record class structure itself.对理解概念和连接记录 class 结构本身有很大帮助。

Hopefully this helps someone else.希望这对其他人有帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 kafka-connect-elasticsearch:当使用“write.method”作为 upsert 时,是否可以在 kafka 主题上使用相同的 AVRO object 来发送部分文档? - kafka-connect-elasticsearch: When using “write.method” as upsert, is it possible to use same AVRO object on kafka topic to send partial document? kafka-connect-elasticsearch:如何将 elasticsearch 与消费者组同步? - kafka-connect-elasticsearch: How to sync elasticsearch with consumer group? kafka-connect-elasticsearch:如何发送文件删除? - kafka-connect-elasticsearch: how to send deletes of documents? kafka-connect-elasticsearch 如何将多个主题路由到同一连接器中的同一 elasticsearch 索引? - kafka-connect-elasticsearch How to route multiple topics to same elasticsearch index in same connector? kafka-connect-elasticsearch:将消息存储为预定义索引的格式 - kafka-connect-elasticsearch: storing messages as format of predefined index 尝试使用Kafka Connect在Elasticsearch中为Kafka主题编制索引 - Trying to index kafka topic in Elasticsearch with Kafka Connect Elasticsearch接收器仅使用kafka-connect-elasticsearch +时间戳SMT仅获得新消息,而不接收前一条消息 - Elasticsearch sink only get new messages and not the previous one using kafka-connect-elasticsearch + timestamp SMT 如何连接 Kafka 和 Elasticsearch? - How to connect Kafka with Elasticsearch? 融合Kafka Connect Elasticsearch文档ID创建 - Confluent kafka connect elasticsearch document ID creation Kafka 连接器到 map 主题键作为 ElasticSearch 中的文档 ID - Kafka Connector to map topic key as document id in ElasticSearch
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM