简体   繁体   中英

Message order with Kafka Connect Elasticsearch Connector

We are having problems enforcing the order in which messages from a Kafka topic are sent to Elasticsearch using the Kafka Connect Elasticsearch Connector. In the topic the messages are in the right order with the correct offsets, but if there are two messages with the same ID created in quick succession, they are intermittently sent to Elasticsearch in the wrong order. This causes Elasticsearch to have the data from the second last message, not from the last message. If we add some artificial delay of a second or two between the two messages in the topic, the problem disappears.

The documentation here states:

Document-level update ordering is ensured by using the partition-level Kafka offset as the document version, and using version_mode=external .

However I can't find any documentation anywhere about this version_mode setting, and whether it's something we need to set ourselves somewhere.

In the log files from the Kafka Connect system we can see the two messages (for the same ID) being processed in the wrong order, a few milliseconds apart. It might be significant that it looks like these are processed in different threads. Also note that there is only one partition for this topic, so all messages are in the same partition.

Below is the log snippet, slightly edited for clarity. The messages in the Kafka topic are populated by Debezium, which I don't think is relevant to the problem, but handily happens to include a timestamp value. This shows that the messages are processed in the wrong order (though they're in the correct order in the Kafka topic, populated by Debezium):

[2019-01-17 09:10:05,671] DEBUG http-outgoing-1 >> "
{
  "op": "u",
  "before": {
    "id": "ac025cb2-1a37-11e9-9c89-7945a1bd7dd1",
    ... << DATA FROM BEFORE SECOND UPDATE >> ...
  },
  "after": {
    "id": "ac025cb2-1a37-11e9-9c89-7945a1bd7dd1",
    ... << DATA FROM AFTER SECOND UPDATE >> ...
  },
  "source": { ... },
  "ts_ms": 1547716205205
}
" (org.apache.http.wire)

...

[2019-01-17 09:10:05,696] DEBUG http-outgoing-2 >> "
{
  "op": "u",
  "before": {
    "id": "ac025cb2-1a37-11e9-9c89-7945a1bd7dd1",
    ... << DATA FROM BEFORE FIRST UPDATE >> ...
  },
  "after": {
    "id": "ac025cb2-1a37-11e9-9c89-7945a1bd7dd1",
    ... << DATA FROM AFTER FIRST UPDATE >> ...
  },
  "source": { ... },
  "ts_ms": 1547716204190
}
" (org.apache.http.wire)

Does anyone know how to force this connector to maintain message order for a given document ID when sending the messages to Elasticsearch?

The problem was that our Elasticsearch connector had the key.ignore configuration set to true .

We spotted this line in the Github source for the connector (in DataConverter.java ):

final Long version = ignoreKey ? null : record.kafkaOffset();

This meant that, with key.ignore=true , the indexing operations that were being generated and sent to Elasticsearch were effectively "versionless" ... basically, the last set of data that Elasticsearch received for a document would replace any previous data, even if it was "old data".

From looking at the log files, the connector seems to have several consumer threads reading the source topic, then passing the transformed messages to Elasticsearch, but the order that they are passed to Elasticsearch is not necessarily the same as the topic order.

Using key.ignore=false , each Elasticsearch message now contains a version value equal to the Kafka record offset, and Elasticsearch refuses to update the index data for a document if it has already received data for a later "version".

That wasn't the only thing that fixed this. We still had to apply a transform to the Debezium message from the Kafka topic to get the key into a plain text format that Elasticsearch was happy with:

"transforms": "ExtractKey",
"transforms.ExtractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.ExtractKey.field": "id"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM