简体   繁体   中英

How can I configure Debezium's MongoDB source connector to send the pk fields in the record_value as expected by the Postgres JDBC sink connector

I'm trying to link a MongoDB (version 4.2.0) with a PostgresDB (version 10) via Kafka Connect using Debezium's MongoDB Source Connector (version 1.5.0) and Confluent's JDBC Sink Connector (version 10.1.1).

I have the following configurations for the source connector:

name=mongodb-debezium-source-connector
connector.class=io.debezium.connector.mongodb.MongoDbConnector
tasks.max=1
mongodb.hosts=rs0/127.0.0.1:27019
mongodb.name=gtmhub
tombstones.on.delete=true

And for the sink connector I have:

name=postgres-sink-connector
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=gtmhub.gtmhub.goals
connection.url=jdbc:postgresql://localhost:5432/gtmhub
connection.user=user
connection.password=password
auto.create=true
insert.mode=upsert
pk.fields=_id
pk.mode=record_value
transforms=unwrap
transforms.unwrap.type=io.debezium.connector.mongodb.transforms.ExtractNewDocumentState
transforms.unwrap.drop.tombstones=false
transforms.unwrap.delete.handling.mode=drop
transforms.unwrap.operation.header=true

I am running the Kafka Connect in standalone mode. I have the following message getting published in the test.test.goals topic by the source connector:

{"schema":{"type":"struct","fields":[{"type":"string","optional":true,"name":"io.debezium.data.Json","version":1,"field":"after"},{"type":"string","optional":true,"name":"io.debezium.data.Json","version":1,"field":"patch"},{"type":"string","optional":true,"name":"io.debezium.data.Json","version":1,"field":"filter"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":false,"field":"rs"},{"type":"string","optional":false,"field":"collection"},{"type":"int32","optional":false,"field":"ord"},{"type":"int64","optional":true,"field":"h"},{"type":"int64","optional":true,"field":"tord"},{"type":"string","optional":true,"field":"stxnid"}],"optional":false,"name":"io.debezium.connector.mongo.Source","field":"source"},{"type":"string","optional":true,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"gtmhub.gtmhub.goals.Envelope"},"payload":{"after":"{\"_id\": {\"$oid\": \"607ff0a569460208cb3aa3f4\"},\"accountId\": {\"$oid\": \"604f3dda3935ce0001ce97ef\"},\"sessionId\": {\"$oid\": \"605b66dccc3b499a4a0d0afa\"},\"name\": \"alabala1\",\"description\": \"\",\"ownerId\": {\"$oid\": \"604f3dda3935ce0001ce97f0\"},\"dateCreated\": {\"$date\": 1616603048895},\"dateFrom\": {\"$date\": 1616450400000},\"dateTo\": {\"$date\": 1625086799999},\"createdById\": {\"$oid\": \"604f3dda3935ce0001ce97f0\"},\"attainment\": 0.0,\"aggregatedAttainment\": 0.0,\"fullAggregatedAttainment\": 0.0,\"childrenAggregatedAttainment\": 0.0,\"metricsAttainment\": 0.0,\"fullSubTreeCount\": 0.0,\"private\": false,\"isDeleted\": false}","patch":null,"filter":null,"source":{"version":"1.5.0.Final","connector":"mongodb","name":"gtmhub","ts_ms":1619002912000,"snapshot":"true","db":"gtmhub","sequence":null,"rs":"rs0","collection":"goals","ord":1,"h":0,"tord":null,"stxnid":null},"op":"r","ts_ms":1619002912791,"transaction":null}}
{"schema":{"type":"struct","fields":[{"type":"string","optional":true,"name":"io.debezium.data.Json","version":1,"field":"after"},{"type":"string","optional":true,"name":"io.debezium.data.Json","version":1,"field":"patch"},{"type":"string","optional":true,"name":"io.debezium.data.Json","version":1,"field":"filter"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":false,"field":"rs"},{"type":"string","optional":false,"field":"collection"},{"type":"int32","optional":false,"field":"ord"},{"type":"int64","optional":true,"field":"h"},{"type":"int64","optional":true,"field":"tord"},{"type":"string","optional":true,"field":"stxnid"}],"optional":false,"name":"io.debezium.connector.mongo.Source","field":"source"},{"type":"string","optional":true,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"gtmhub.gtmhub.goals.Envelope"},"payload":{"after":null,"patch":"{\"$v\": 1,\"$set\": {\"name\": \"alabala\"}}","filter":"{\"_id\": {\"$oid\": \"607ff0a569460208cb3aa3f4\"}}","source":{"version":"1.5.0.Final","connector":"mongodb","name":"gtmhub","ts_ms":1619006391000,"snapshot":"false","db":"gtmhub","sequence":null,"rs":"rs0","collection":"goals","ord":1,"h":0,"tord":null,"stxnid":"ec07e1d8-299a-3074-a47e-de21c3b8348c:1"},"op":"u","ts_ms":1619006392023,"transaction":null}}

These lead to the sink connector returning the error below:

INFO Attempting to open connection #1 to PostgreSql (io.confluent.connect.jdbc.util.CachedConnectionProvider:82)
[2021-04-21 14:59:52,077] INFO JdbcDbWriter Connected (io.confluent.connect.jdbc.sink.JdbcDbWriter:56)
[2021-04-21 14:59:52,089] ERROR WorkerSinkTask{id=postgres-sink-connector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: PK mode for table 'goals' is RECORD_VALUE with configured PK fields [_id], but record value schema does not contain field: _id (org.apache.kafka.connect.runtime.WorkerSinkTask:612)
org.apache.kafka.connect.errors.ConnectException: PK mode for table 'goals' is RECORD_VALUE with configured PK fields [_id], but record value schema does not contain field: _id
    at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extractRecordValuePk(FieldsMetadata.java:280)
    at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:105)
    at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:67)
    at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:116)
    at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:74)
    at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:84)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

The whole setup worked just fine with regard to inserting data into the Postgres database when the sink connector was in insert modе. The only problem is that update operations were treated as new rows of data in the Postgres, which of course is unwanted behavior. Therefore, I added to the sink connector's configuration the insert.mode=upsert along with the pk.fields and pk.mode as per documentation. Unfortunately, I ended up with the error above coming from the sink connector not being able to extract the pk field from the record_value. However, since I am novice with Kafka I can't figure out how to configure the source connector so that the messages that it produces contain the necessary pk.field in the record_value.

Any help and/ or advice is most welcome!

try sink connector config: pk.fields=id pk.mode=record_key

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM