Spark streaming app resets kafka offset continuously

Question

I have a spark streaming app that runs on spark cluster with 4 node. A few days ago the app keeps resetting Kafka offset and does not fetch Kafka data anymore while the AUTO OFFSET RESET is set,

this is the log:

22/06/28 21:39:38 INFO AppInfoParser: Kafka version : 2.0.0
18|stream  | 22/06/28 21:39:38 INFO AppInfoParser: Kafka commitId : 3402a8361b734732
18|stream  | 22/06/28 21:39:39 INFO Metadata: Cluster ID: 3cAbAp6-QNyO1cKEc1dtUA
18|stream  | 22/06/28 21:39:39 INFO AbstractCoordinator: [Consumer clientId=consumer-1, groupId=testid1] Discovered group coordinator xxx.xxx.xxx.xxx:9092 (id: 2147483647 rack: null)
18|stream  | 22/06/28 21:39:39 INFO ConsumerCoordinator: [Consumer clientId=consumer-1, groupId=testid1] Revoking previously assigned partitions []
18|stream  | 22/06/28 21:39:39 INFO AbstractCoordinator: [Consumer clientId=consumer-1, groupId=testid1] (Re-)joining group
18|stream  | 22/06/28 21:39:39 INFO AbstractCoordinator: [Consumer clientId=consumer-1, groupId=testid1] Successfully joined group with generation 9042
18|stream  | 22/06/28 21:39:39 INFO ConsumerCoordinator: [Consumer clientId=consumer-1, groupId=testid1] Setting newly assigned partitions [applog-15, applog-14, applog-13, applog-12, applog-11, applog-10, applog-9, new_apploglog-0, applog-8, applog-7, applog-6, applog-5, applog-4, applog-3, applog-2, applog-1, applog-0]
18|stream  | 22/06/28 21:39:39 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition new_apploglog-0 to offset 16767946.
18|stream  | 22/06/28 21:39:39 INFO RecurringTimer: Started timer for JobGenerator at time 1656452400000
18|stream  | 22/06/28 21:39:39 INFO JobGenerator: Started JobGenerator at 1656452400000 ms
18|stream  | 22/06/28 21:39:39 INFO JobScheduler: Started JobScheduler
18|stream  | 22/06/28 21:39:39 INFO StreamingContext: StreamingContext started
18|stream  | 22/06/28 21:39:40 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (xxx.xxx.xxx.xxx:46588) with ID 2
18|stream  | 22/06/28 21:39:40 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (xxx.xxx.xxx.xxx:48860) with ID 3
18|stream  | 22/06/28 21:39:40 INFO BlockManagerMasterEndpoint: Registering block manager xxx.xxx.xxx.xxx:35981 with 4.6 GB RAM, BlockManagerId(2, xxx.xxx.xxx.xxx, 35981, None)
18|stream  | 22/06/28 21:39:40 INFO BlockManagerMasterEndpoint: Registering block manager xxx.xxx.xxx.xxx:40001 with 4.6 GB RAM, BlockManagerId(3, xxx.xxx.xxx.xxx, 40001, None)
18|stream  | 22/06/28 21:39:40 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (xxx.xxx.xxx.xxx:39858) with ID 1
18|stream  | 22/06/28 21:39:40 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (xxx.xxx.xxx.xxx:57696) with ID 0
18|stream  | 22/06/28 21:39:40 INFO BlockManagerMasterEndpoint: Registering block manager xxx.xxx.xxx.xxx:44765 with 4.6 GB RAM, BlockManagerId(1, xxx.xxx.xxx.xxx, 44765, None)
18|stream  | 22/06/28 21:39:40 INFO BlockManagerMasterEndpoint: Registering block manager xxx.xxx.xxx.xxx:46661 with 4.6 GB RAM, BlockManagerId(0, xxx.xxx.xxx.xxx, 46661, None)
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-15 to offset 285007408.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-14 to offset 285006512.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-13 to offset 285006673.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-12 to offset 285006392.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-11 to offset 285006399.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-10 to offset 285006961.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-9 to offset 285007334.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition new_apploglog-0 to offset 16838546.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-8 to offset 285007057.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-7 to offset 285005614.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-6 to offset 285007348.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-5 to offset 285004512.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-4 to offset 285005570.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-3 to offset 285008145.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-2 to offset 285007214.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-1 to offset 285007686.
18|stream  | 22/06/28 21:40:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-0 to offset 316632614.
18|stream  | 22/06/28 21:40:00 INFO JobScheduler: Added jobs for time 1656452400000 ms
18|stream  | 22/06/28 21:40:00 INFO JobScheduler: Starting job streaming job 1656452400000 ms.0 from job set of time 1656452400000 ms
18|stream  | 22/06/28 21:40:00 INFO SparkContext: Starting job: collect at Main.scala:76
18|stream  | 22/06/28 21:40:00 INFO DAGScheduler: Registering RDD 1 (repartition at Main.scala:69)
18|stream  | 22/06/28 21:40:00 INFO DAGScheduler: Got job 0 (collect at Main.scala:76) with 16 output partitions
18|stream  | 22/06/28 21:40:00 INFO DAGScheduler: Final stage: ResultStage 1 (collect at Main.scala:76)
18|stream  | 22/06/28 21:40:00 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18|stream  | 22/06/28 21:40:00 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18|stream  | 22/06/28 21:40:00 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[1] at repartition at Main.scala:69), which has no missing parents
18|stream  | 22/06/28 21:40:00 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.9 KB, free 5.2 GB)
18|stream  | 22/06/28 21:40:00 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.1 KB, free 5.2 GB)
18|stream  | 22/06/28 21:40:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx.xxx.xxx.xxx:41399 (size: 3.1 KB, free: 5.2 GB)
18|stream  | 22/06/28 21:40:00 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
18|stream  | 22/06/28 21:40:00 INFO DAGScheduler: Submitting 17 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[1] at repartition at Main.scala:69) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
18|stream  | 22/06/28 21:40:00 INFO TaskSchedulerImpl: Adding task set 0.0 with 17 tasks
18|stream  | 22/06/28 21:40:00 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 0, xxx.xxx.xxx.xxx, executor 0, partition 1, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:00 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 1, xxx.xxx.xxx.xxx, executor 1, partition 8, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:00 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 2, xxx.xxx.xxx.xxx, executor 2, partition 3, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:00 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 3, xxx.xxx.xxx.xxx, executor 3, partition 0, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:01 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx.xxx.xxx.xxx:44765 (size: 3.1 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:01 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx.xxx.xxx.xxx:40001 (size: 3.1 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:01 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx.xxx.xxx.xxx:35981 (size: 3.1 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:01 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx.xxx.xxx.xxx:46661 (size: 3.1 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:02 INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 4, xxx.xxx.xxx.xxx, executor 1, partition 10, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 5, xxx.xxx.xxx.xxx, executor 3, partition 4, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 1) in 2181 ms on xxx.xxx.xxx.xxx (executor 1) (1/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 3) in 2187 ms on xxx.xxx.xxx.xxx (executor 3) (2/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 6, xxx.xxx.xxx.xxx, executor 2, partition 7, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 2) in 2463 ms on xxx.xxx.xxx.xxx (executor 2) (3/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 7, xxx.xxx.xxx.xxx, executor 3, partition 5, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 5) in 343 ms on xxx.xxx.xxx.xxx (executor 3) (4/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 13.0 in stage 0.0 (TID 8, xxx.xxx.xxx.xxx, executor 1, partition 13, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 4) in 389 ms on xxx.xxx.xxx.xxx (executor 1) (5/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 9, xxx.xxx.xxx.xxx, executor 0, partition 2, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 0) in 2773 ms on xxx.xxx.xxx.xxx (executor 0) (6/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 10, xxx.xxx.xxx.xxx, executor 2, partition 11, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 6) in 403 ms on xxx.xxx.xxx.xxx (executor 2) (7/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 11, xxx.xxx.xxx.xxx, executor 3, partition 9, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 7) in 362 ms on xxx.xxx.xxx.xxx (executor 3) (8/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 15.0 in stage 0.0 (TID 12, xxx.xxx.xxx.xxx, executor 1, partition 15, PROCESS_LOCAL, 7746 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 8) in 369 ms on xxx.xxx.xxx.xxx (executor 1) (9/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 13, xxx.xxx.xxx.xxx, executor 1, partition 16, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 12) in 146 ms on xxx.xxx.xxx.xxx (executor 1) (10/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 11) in 247 ms on xxx.xxx.xxx.xxx (executor 3) (11/17)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 14, xxx.xxx.xxx.xxx, executor 0, partition 6, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:03 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 9) in 382 ms on xxx.xxx.xxx.xxx (executor 0) (12/17)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 15, xxx.xxx.xxx.xxx, executor 2, partition 14, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID 10) in 337 ms on xxx.xxx.xxx.xxx (executor 2) (13/17)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 13) in 331 ms on xxx.xxx.xxx.xxx (executor 1) (14/17)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Starting task 12.0 in stage 0.0 (TID 16, xxx.xxx.xxx.xxx, executor 0, partition 12, PROCESS_LOCAL, 7748 bytes)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 14) in 303 ms on xxx.xxx.xxx.xxx (executor 0) (15/17)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 15) in 271 ms on xxx.xxx.xxx.xxx (executor 2) (16/17)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 16) in 291 ms on xxx.xxx.xxx.xxx (executor 0) (17/17)
18|stream  | 22/06/28 21:40:04 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
18|stream  | 22/06/28 21:40:04 INFO DAGScheduler: ShuffleMapStage 0 (repartition at Main.scala:69) finished in 4.222 s
18|stream  | 22/06/28 21:40:04 INFO DAGScheduler: looking for newly runnable stages
18|stream  | 22/06/28 21:40:04 INFO DAGScheduler: running: Set()
18|stream  | 22/06/28 21:40:04 INFO DAGScheduler: waiting: Set(ResultStage 1)
18|stream  | 22/06/28 21:40:04 INFO DAGScheduler: failed: Set()
18|stream  | 22/06/28 21:40:04 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[6] at map at Main.scala:73), which has no missing parents
18|stream  | 22/06/28 21:40:04 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 5.2 GB)
18|stream  | 22/06/28 21:40:04 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.7 KB, free 5.2 GB)
18|stream  | 22/06/28 21:40:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx.xxx.xxx.xxx:41399 (size: 2.7 KB, free: 5.2 GB)
18|stream  | 22/06/28 21:40:04 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
18|stream  | 22/06/28 21:40:04 INFO DAGScheduler: Submitting 16 missing tasks from ResultStage 1 (MapPartitionsRDD[6] at map at Main.scala:73) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
18|stream  | 22/06/28 21:40:04 INFO TaskSchedulerImpl: Adding task set 1.0 with 16 tasks
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 17, xxx.xxx.xxx.xxx, executor 2, partition 0, NODE_LOCAL, 7942 bytes)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 18, xxx.xxx.xxx.xxx, executor 3, partition 1, NODE_LOCAL, 7942 bytes)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 19, xxx.xxx.xxx.xxx, executor 1, partition 2, NODE_LOCAL, 7942 bytes)
18|stream  | 22/06/28 21:40:04 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 20, xxx.xxx.xxx.xxx, executor 0, partition 3, NODE_LOCAL, 7942 bytes)
18|stream  | 22/06/28 21:40:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx.xxx.xxx.xxx (size: 2.7 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx.xxx.xxx.xxx:44765 (size: 2.7 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx.xxx.xxx.xxx:40001 (size: 2.7 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx.xxx.xxx.xxx:46661 (size: 2.7 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:04 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to xxx.xxx.xxx.xxx:48860
18|stream  | 22/06/28 21:40:04 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to xxx.xxx.xxx.xxx:39858
18|stream  | 22/06/28 21:40:04 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to xxx.xxx.xxx.xxx:46588
18|stream  | 22/06/28 21:40:04 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to xxx.xxx.xxx.xxx:57696
18|stream  | 22/06/28 21:40:32 INFO BlockManagerInfo: Removed broadcast_0_piece0 on xxx.xxx.xxx.xxx:41399 in memory (size: 3.1 KB, free: 5.2 GB)
18|stream  | 22/06/28 21:40:32 INFO BlockManagerInfo: Removed broadcast_0_piece0 on xxx.xxx.xxx.xxx:44765 in memory (size: 3.1 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:32 INFO BlockManagerInfo: Removed broadcast_0_piece0 on xxx.xxx.xxx.xxx:35981 in memory (size: 3.1 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:32 INFO BlockManagerInfo: Removed broadcast_0_piece0 on xxx.xxx.xxx.xxx:46661 in memory (size: 3.1 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:40:32 INFO BlockManagerInfo: Removed broadcast_0_piece0 on xxx.xxx.xxx.xxx:40001 in memory (size: 3.1 KB, free: 4.6 GB)
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-15 to offset 285007532.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-14 to offset 285006636.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-13 to offset 285006799.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-12 to offset 285006518.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-11 to offset 285006525.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-10 to offset 285007087.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-9 to offset 285007459.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition new_apploglog-0 to offset 16838553.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-8 to offset 285007182.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-7 to offset 285005739.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-6 to offset 285007471.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-5 to offset 285004635.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-4 to offset 285005693.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-3 to offset 285008268.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-2 to offset 285007337.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-1 to offset 285007810.
18|stream  | 22/06/28 21:42:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-0 to offset 316632738.
18|stream  | 22/06/28 21:42:00 INFO JobScheduler: Added jobs for time 1656452520000 ms

18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-15 to offset 285007665.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-14 to offset 285006770.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-13 to offset 285006931.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-12 to offset 285006650.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-11 to offset 285006657.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-10 to offset 285007219.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-9 to offset 285007591.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition new_apploglog-0 to offset 16838556.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-8 to offset 285007314.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-7 to offset 285005871.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-6 to offset 285007603.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-5 to offset 285004767.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-4 to offset 285005825.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-3 to offset 285008400.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-2 to offset 285007469.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-1 to offset 285007942.
18|stream  | 22/06/28 21:44:00 INFO Fetcher: [Consumer clientId=consumer-1, groupId=testid1] Resetting offset for partition applog-0 to offset 316632870.
18|stream  | 22/06/28 21:44:00 INFO JobScheduler: Added jobs for time 1656452640000 ms

resetting Kafka offset repeats forever without even an error.

I did these actions to solve the problem but it did not any help:

reset Kafka offset to earliest or latest
delete consumer group and create new one
I even changed the topic but nothing changed, so I guessed it was from spark cluster, but I can load Kafka data with Pyspark shell on the same cluster

notes:

the app was working OK about 3 years!
recently we had server migration and some of our resources has been decreased
other non-streaming jobs run on spark cluster without any issue

Is there anything that I'm missing?

Answer 1

Can you confirm what was the value set in AUTO OFFSET RESET before the upgrade? Also you can inspect the offset against the consumer group on that particular topic executing the following command.

bin/kafka-consumer-groups — bootstrap-server localhost:9092 — describe — group consumer-group

Also, check for any KAFKA Upgrade wrt broker configuration changes recently.

There are edge case scenarios also for this behaviour due to any poison pills or changes in the consumer behaviour. Because, there are many factors when auto.offset.reset property will kicks in efficiently.

One such case from the doc,

There is an edge case that could result in data loss, whereby a message is not redelivered in a retryable exception scenario. This scenario applies to a new consumer group that is yet to have recorded any current offset (or the offset has been deleted).

Two consumer instances, A and B, join a new consumer group.
The consumer instances are configured with auto.offset.reset as latest (ie new messages only).
Consumer A consumes a new message from the topic partition.
Consumer A dies before processing of the message has completed. The consumer offsets are not updated to mark the message as consumed.
The consumer group rebalances, and Consumer B is assigned to the topic partition.
As there is no valid offset, and auto.offset.reset is set to latest, the message is not consumed.

Answer 2

It was from my redis server. After migration it became unstable and sometimes it was not working correctly. so I restarted the service and everything worked fine.

Spark streaming app resets kafka offset continuously

Question

2 answers

solution1
0 2022-06-29 04:13:15

solution2
0 2022-07-16 19:05:00

Spark streaming app resets kafka offset continuously

Question

2 answers

solution1 0 2022-06-29 04:13:15

solution2 0 2022-07-16 19:05:00

solution1
0 2022-06-29 04:13:15

solution2
0 2022-07-16 19:05:00