简体   繁体   中英

kafka-connect-elasticsearch: How to delete document based on certain value in header of the Kafka topic

I am getting up to speed here with Kafka Connect. Trying to use Kafka Connect Elasticsearch Service sink connector to move our data from Kafka to Elasticsearch. I have a processing stream that looks like this:

File record from s3->custom processing from source application which publishes to ->kafka topic->Kafka connect->Elastic Search

This works for the scenario of create/update. However we want to handle the delete scenario for the file. Our application publishes an event for delete action and sets that as part of the header value in the Kafka message. Instead of updating the document in elastic with this delete action info, we would like to delete the document itself.

How we can achieve this using Kafka Connect to read this header value and issue a delete of document for the given key from the Elastic?

Thanks for your help in advance.

Regards, Vikas

EDITED: Example of message I am trying to transform:

[{
    "key": "fileid=05ffefea-a71d-4bb7-091e-08d8f9229806",
    "rownum": 0,
    "metadata": {
        "offset": 1468950,
        "partition": 3,
        "timestamp": 1617773161088,
        "__keysize": 43,
        "__valsize": 596
    },
    "headers": {
        "sub-tenant-id": "",
        "actiondate": "2021-04-07T05:26:01.0790010Z",
        "action": "uploaded",
        "contentversion": "V1"
    },
    "value": {
        "id": "fil.05ffefeaa71d4bb7091e08d8f9229806",
        "name": "4.txt",
        "volumeId": "vol.e25196dc9e2f460bb27308d8f8405691",
        "volumeName": "projdmck0405",
        "type": "text/plain",
        "subTenantId": "",
        "path": "/4.txt",
        "timeCreated": "2021-04-07T05:25:46.129Z",
        "timeModified": "2021-04-07T05:25:46.129Z",
        "urn": "urn:mycompany:product:test:app:file:fil.05ffefeaa71d4bb7091e08d8f9229806#/4.txt",
        "sizeInBytes": 76,
        "isUploaded": true,
        "archiveStatus": "None",
        "storageTier": "Standard",
        "eTag": "11fb9ec5531d90d571b331cc39e43175"
    }
}]

I am trying to add the action header field and value to the body of the message.

Here is the transform I used using the example: https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-transform-common/transformations/examples/HeaderToField.headertofield.html

 "transforms"                            : "dropNullRecords,headerToField",

 "transforms.headerToField.type"             : "com.github.jcustenborder.kafka.connect.transform.common.HeaderToField$Value",
      "transforms.headerToField.header.mappings"  : "action:STRING:actioninbody"

I did try this with mappings value of "action:STRING" just following the example then I noticed the format mentioned as:

The format is <header name>:<header type>[:field name]. 

What I am missing?

I was able to achieve this. Ended up writing a custom SMT. Using the SMT I had access to the connect record including the header and the value. So I just read the header values on by one and when encountered the value I was interested in, I set the connect record's value to null. In addition to this Kafka Connect also exposes below parameter:

behavior.on.null.values
How to handle records with a non-null key and a null value (for example, Kafka tombstone records). Valid options are IGNORE, DELETE, and FAIL.

Type: string
Default: FAIL
Valid Values: (case insensitive) [DELETE, IGNORE, FAIL]
Importance: low

I set the value to DELETE and it started deleted the record from ES index.

I followed this example of custom SMT from Confluent: https://github.com/confluentinc/kafka-connect-insert-uuid

Helped a lot to understand the concept and connect record class structure itself.

Hopefully this helps someone else.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM