The documentation says:
enable.auto.commit: Kafka source doesn't commit any offset.
Hence my question is, in the event of a worker or partition crash/restart :
This is seems to be quite important. Any indication on how to deal with it ?
I also ran into this issue.
You're right in your observations on the 2 options ie
startingOffsets
is set to latest
startingOffsets
is set to earliest
However...
There is the option of checkpointing by adding the following option:
.writeStream .<something else> .option("checkpointLocation", "path/to/HDFS/dir") .<something else>
In the event of a failure, Spark would go through the contents of this checkpoint directory, recover the state before accepting any new data.
I found this useful reference on the same.
Hope this helps!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.