Kafka Connect：多个DB2 JDBC源连接器失败

Question

I'm trying to use Kafka Connect in a local Docker container (using the official Confluent image) in order to push DB2 data to a Kafka cluster on Openshift (on AWS). 我试图在本地Docker容器中使用Kafka Connect（使用官方的Confluent映像），以便将DB2数据推送到Openshift（在AWS上）上的Kafka集群。 I'm using the Confluent JDBC connector with my DB2 JDBC-Jar. 我将Confluent JDBC连接器与DB2 JDBC-Jar一起使用。 I have different connector configs since I use SMT with "transforms.createKey" (to create my key) and the key columns in my tables have different names. 由于将SMT与“ transforms.createKey”（用于创建键）一起使用，并且表中的键列具有不同的名称，因此我具有不同的连接器配置。

Here are my steps: 这是我的步骤：

create topics for Kafka Connect for config, offset and status 为Kafka Connect创建主题以获取配置，偏移和状态
start/create Kafka Connect container (with env vars see below) 启动/创建Kafka Connect容器（使用env vars参见下文）
create the first JDBC connector via post call to my Connect container (config see below) 通过对我的Connect容器的调用来创建第一个JDBC连接器（配置请参见下文）

So far everything works well and I can see my data being pushed to the cluster. 到目前为止，一切工作正常，我可以看到我的数据已推送到集群。 However as soon as the I add a second JDBC connector via post call, the first connector stops pushing data to the cluster while the second starts and continues to load and push data. 但是，一旦我通过post调用添加了第二个JDBC连接器，第一个连接器就会停止将数据推送到集群，而第二个连接器会启动，并继续加载和推送数据。 There is a short time where it seems that both connectors push data to the cluster, but I'm assuming that this might be data from connector 1 that are still flushed. 在很短的时间内，似乎两个连接器都将数据推送到群集，但是我假设这可能是来自连接器1的数据仍被刷新。 The problem is that a) even trace logs do not show an error that's meaningful (to me at least) and b) the errors that are shown differ between tries (I've always deleted all topics and the container). 问题是，a）甚至跟踪日志也没有显示有意义的错误（至少对我而言），b）两次尝试之间显示的错误有所不同（我一直删除所有主题和容器）。

I'm assuming this is not a bug but rather a combination of configs that need to be set appropriately and/or I'm lacking understanding of some basic Kafka Connect core functionalities. 我假设这不是错误，而是需要适当设置的配置组合和/或我缺乏对一些基本的Kafka Connect核心功能的了解。 I've already tried to add and change various configs but unfortunately nothing has worked out so far. 我已经尝试添加和更改各种配置，但不幸的是到目前为止，还没有任何解决方法。 I've given it a good number of tries not but no luck. 我已经尝试了很多，但没有运气。 I've attached the logs of my two most recent tries as well as the configs. 我已经附加了我最近两次尝试的日志以及配置。

Does anyone have an idea which config I could adapt or what to look into in order to fix this? 有谁知道我可以修改哪个配置或研究一下以解决此问题？ Any help is appreciated - thanks! 任何帮助表示赞赏-谢谢！

Kafka: 2.0.0
Docker image: confluentinc/cp-kafka-connect:5.0.0
DB2: 10.5
JDBC Jar: db2jcc4.jar with version 4.19.76

Logs 1st try: 记录第一次尝试：

[2018-12-17 13:09:15,683] ERROR Invalid call to OffsetStorageWriter flush() while already flushing, the framework should not allow this (org.apache.kafka.connect.storage.OffsetStorageWriter)
[2018-12-17 13:09:15,684] ERROR WorkerSourceTask{id=db2-jdbc-source-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: OffsetStorageWriter is already flushing
    at org.apache.kafka.connect.storage.OffsetStorageWriter.beginFlush(OffsetStorageWriter.java:110)
    at org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:409)
    at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:238)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2018-12-17 13:09:15,686] ERROR WorkerSourceTask{id=db2-jdbc-source-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
[2018-12-17 13:09:15,686] INFO [Producer clientId=producer-4] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2018-12-17 13:09:20,682] ERROR Graceful stop of task db2-jdbc-source-0 failed. (org.apache.kafka.connect.runtime.Worker)
[2018-12-17 13:09:20,682] INFO Finished stopping tasks in preparation for rebalance (org.apache.kafka.connect.runtime.distributed.DistributedHerder)

Logs 2nd try: 记录第二次尝试：

[2018-12-17 14:01:31,658] INFO Stopping task db2-jdbc-source-0 (org.apache.kafka.connect.runtime.Worker)
[2018-12-17 14:01:31,689] INFO Stopped connector db2-jdbc-source (org.apache.kafka.connect.runtime.Worker)
[2018-12-17 14:01:31,784] INFO WorkerSourceTask{id=db2-jdbc-source-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2018-12-17 14:01:31,784] INFO WorkerSourceTask{id=db2-jdbc-source-0} flushing 20450 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2018-12-17 14:01:36,733] ERROR Graceful stop of task db2-jdbc-source-0 failed. (org.apache.kafka.connect.runtime.Worker)
[2018-12-17 14:01:36,733] INFO Finished stopping tasks in preparation for rebalance (org.apache.kafka.connect.runtime.distributed.DistributedHerder)

screenshot of incoming messages per second in the Kafka cluster Kafka群集中每秒传入消息的屏幕截图

Kafka Connect Docker env variables: Kafka Connect Docker环境变量：

-e CONNECT_BOOTSTRAP_SERVERS=my_kafka_cluster:443 \
  -e CONNECT_PRODUCER_BOOTSTRAP_SERVERS="my_kafka_cluster:443" \
  -e CONNECT_REST_ADVERTISED_HOST_NAME="kafka-connect" \
  -e CONNECT_REST_PORT=8083 \
  -e CONNECT_GROUP_ID="kafka-connect-group" \
  -e CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=3 \
  -e CONNECT_CONFIG_STORAGE_TOPIC="kafka-connect-config" \
  -e CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=3 \
  -e CONNECT_OFFSET_STORAGE_TOPIC="kafka-connect-offset" \
  -e CONNECT_OFFSET_FLUSH_INTERVAL_MS=15000 \
  -e CONNECT_OFFSET_FLUSH_TIMEOUT_MS=60000 \
  -e CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=3 \
  -e CONNECT_STATUS_STORAGE_TOPIC="kafka-connect-status" \
  -e CONNECT_KEY_CONVERTER="io.confluent.connect.avro.AvroConverter" \
  -e CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL=http://url_to_schemaregistry \
  -e CONNECT_VALUE_CONVERTER="io.confluent.connect.avro.AvroConverter" \
  -e CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL=http://url_to_schemaregistry \
  -e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
  -e CONNECT_INTERNAL_KEY_CONVERTER_SCHEMAS_ENABLE="false" \
  -e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
  -e CONNECT_INTERNAL_VALUE_CONVERTER_SCHEMAS_ENABLE="false" \
  -e CONNECT_PLUGIN_PATH=/usr/share/java \
  -e CONNECT_PRODUCER_BUFFER_MEMORY="8388608" \
  -e CONNECT_SECURITY_PROTOCOL="SSL" \
  -e CONNECT_PRODUCER_SECURITY_PROTOCOL="SSL" \
  -e CONNECT_SSL_TRUSTSTORE_LOCATION="/usr/share/kafka.client.truststore.jks" \
  -e CONNECT_PRODUCER_SSL_TRUSTSTORE_LOCATION="/usr/share/kafka.client.truststore.jks" \
  -e CONNECT_SSL_TRUSTSTORE_PASSWORD="my_ts_pw" \
  -e CONNECT_PRODUCER_SSL_TRUSTSTORE_PASSWORD="my_ts_pw" \
  -e CONNECT_LOG4J_LOGGERS=org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR \
  -e CONNECT_LOG4J_ROOT_LOGLEVEL=INFO \
  -e HOSTNAME=kafka-connect \

JDBC connectors (only the tables and key columns vary): JDBC连接器（仅表和键列有所不同）：

{
    "name": "db2-jdbc-source",
    "config": 
    {
        "mode":"timestamp",
        "debug":"true",
        "batch.max.rows":"50",
        "poll.interval.ms":"10000",
        "timestamp.delay.interval.ms":"60000",
        "timestamp.column.name":"IBMSNAP_LOGMARKER",
        "connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector" ,
        "connection.url":"jdbc:db2://myip:myport/mydb:currentSchema=myschema;",
        "connection.password":"mypw",
        "connection.user":"myuser",
        "connection.backoff.ms":"60000",
        "dialect.name": "Db2DatabaseDialect",
        "table.types": "TABLE",
        "table.poll.interval.ms":"60000",
        "table.whitelist":"MYTABLE1",
        "tasks.max":"1",
        "topic.prefix":"db2_",
        "key.converter":"io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url":"http://url_to_schemaregistry",
        "value.converter":"io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url":"http://url_to_schemaregistry",
        "transforms":"createKey",
        "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
        "transforms.createKey.fields":"MYKEY1"
    }
}

Answer 1

I eventually figured out the problem: I'm using the JDBC connector in the timestamp mode and not timestamp+incrementing because I cannot (always) specify an incrementing column. 最终，我发现了问题所在：我在时间戳模式下使用JDBC连接器，而不是使用时间戳+增量，因为我无法（总是）指定增量列。 I was aware that this might lead to the problem, that Connect cannot know which entries have already been read when there's multiple with the same timestamp. 我知道这可能会导致问题，当多个具有相同时间戳的条目时，Connect无法知道已经读取了哪些条目。

A big part of my data rows have the same timestamp. 我的数据行很大一部分具有相同的时间戳。 When I added the second connector, the current timestamp of the first connector was stored and Connect started the rebalancing and hence loosing the information which rows for that stimestamp had been read already. 当我添加第二个连接器时，存储了第一个连接器的当前时间戳，Connect开始重新平衡，因此失去了已读取该时间戳的哪些行的信息。 When the connectors were up and running again the first connector continued with "the next timestamp" and hence only loading the newest rows (which are only a small part). 当连接器重新启动并再次运行时，第一个连接器将继续“下一个时间戳记”，因此仅加载最新的行（仅占很小的一部分）。

My mistake was assuming, that in a situation like this, the first connector would restart working with the previous timestamp than continuing with the "next timestamp". 我的错误是假设在这种情况下，第一个连接器将重新开始使用先前的时间戳，而不是继续使用“下一个时间戳”。 It would've made more sense to me to rather risking duplicates than potentially missing data. 我宁愿冒着重复的风险，而不是可能丢失的数据的风险。

Kafka Connect：多个DB2 JDBC源连接器失败

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-12-18 09:36:54

Kafka Connect：多个DB2 JDBC源连接器失败

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-12-18 09:36:54

解决方案1
0 已采纳 2018-12-18 09:36:54