简体   繁体   中英

Kafka connect ElasticSearch sink - using if-else blocks to extract and transform fields for different topics

I have a kafka es sink properties file like the following

name=elasticsearch.sink.direct
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=16
topics=data.my_setting

connection.url=http://dev-elastic-search01:9200
type.name=logs
topic.index.map=data.my_setting:direct_my_setting_index
batch.size=2048
max.buffered.records=32768
flush.timeout.ms=60000
max.retries=10
retry.backoff.ms=1000
schema.ignore=true
transforms=InsertKey,ExtractId
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=MY_SETTING_ID
transforms.ExtractId.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractId.field=MY_SETTING_ID

This works perfectly for a single topic (data.my_setting). I would like to use the same connector for data coming in from more than one topic. A message in a different topic will have a different key which I'll need to transform.I was wondering if there's a way to use if else statements with a condition on the topic name or on a single field in the message such that I can then transform the key differently. All the incoming messages are json with schema and payload.

UPDATE based on the answer :

In my jdbc connector I add the key as follows:

name=data.my_setting
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
poll.interval.ms=500
tasks.max=4
mode=timestamp
query=SELECT * FROM MY_TABLE with (nolock)
timestamp.column.name=LAST_MOD_DATE
topic.prefix=investment.ed.data.app_setting

transforms=ValueToKey
transforms.ValueToKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.ValueToKey.fields=MY_SETTING_ID

I still however get the error when a message produced from this connector is read by elasticsearch sink

org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
Caused by: org.apache.kafka.connect.errors.DataException: STRUCT is not supported as the document id

The payload looks like this:

{
"schema": {
    "type": "struct",
    "fields": [{
            "type": "int32",
            "optional": false,
            "field": "MY_SETTING_ID"
        }, {
            "type": "string",
            "optional": true,
            "field": "MY_SETTING_NAME"
        }
    ],
    "optional": false
},
"payload": {
    "MY_SETTING_ID": 9,
    "MY_SETTING_NAME": "setting_name"
}
}

Connect standalone property file looks like this:

bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter 
value.converter=org.apache.kafka.connect.json.JsonConverter 
converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter 
internal.value.converter=org.apache.kafka.connect.json.JsonConverter 
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/apps/{env}/logs/infrastructure/offsets/connect.offsets
rest.port=8084
plugin.path=/usr/share/java

Is there a way to achieve my goal which is to have messages from multiple topics (in my case db tables) which will have their own unique ids (which will also be the id of a document in ES) be sent to a single ES sink.

Can I use avro for this task. Is there a way to define the key in schema registry or will I run into the same problem?

This isn't possible. You'd need multiple Connectors if the key fields are different.

One option to think about is pre-processing your Kafka topics through a stream processor (eg Kafka Streams, KSQL, Spark Streaming etc etc) to standardise the key fields, so that you can then use a single connector. It depends what you're building as to whether this would be worth doing, or overkill.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM