简体   繁体   English

了解 flink 保存点和检查点

[英]Understanding flink savepoints & checkpoints

Considering an Apache Flink streaming-application with a pipeline like this:考虑一个带有如下管道的 Apache Flink 流应用程序:

Kafka-Source -> flatMap 1 -> flatMap 2 -> flatMap 3 -> Kafka-Sink

where every flatMap function is a non-stateful operator (eg the normal .flatMap function of a Datastream ).其中每个flatMap函数都是无状态运算符(例如Datastream的普通.flatMap函数)。

How do checkpoints/savepoints work, in case an incoming message will be pending at flatMap 3 ?检查点/保存点如何工作,以防传入消息将在flatMap 3处挂起? Will the message be reprocessed after restart beginning from flatMap 1 or will it skip to flatMap 3 ?flatMap 1开始重新启动后消息会被重新处理还是会跳到flatMap 3

I am a bit confused, because the documentation seems to refer to application state as what I can use in stateful operators, but I don't have stateful operators in my application.我有点困惑,因为文档似乎将应用程序状态称为我可以在有状态运算符中使用的内容,但我的应用程序中没有有状态运算符。 Is the "processing progress" saved & restored at all , or will the whole pipeline be re-processed after a failure/restart? “处理进度”是完全保存和恢复,还是在失败/重启后会重新处理整个管道?

And this there a difference between a failure (-> flink restores from checkpoint) and manual restart using savepoints regarding my previous questions?关于我之前的问题,失败(-> flink 从检查点恢复)和使用保存点手动重启之间有区别吗?

I tried finding out myself (with enabled checkpointing using EXACTLY_ONCE and rocksdb-backend) by placing a Thread.sleep() in flatMap 3 and then cancelling the job with a savepoint.我尝试通过在flatMap 3放置Thread.sleep()并使用保存点取消作业来找出自己(使用EXACTLY_ONCE和 Rocksdb-backend 启用检查点)。 However this lead to the flink commandline tool hanging until the sleep was over, and even then flatMap 3 was executed and even sent out to the sink before the job got cancelled.然而,这导致flink命令行工具挂起直到sleep结束,即使如此, flatMap 3也被执行,甚至在作业被取消之前发送到接收器。 So it seems I can not manually force this situation to analyze flink's behaviour.所以似乎我无法手动强制这种情况来分析 flink 的行为。

In case "processing progress" is not saved/covered by the checkpointing/savepoints as I described above, how could I make sure for every message reaching my pipeline that any given operator (flatmap 1/2/3) is never re-processed in a restart/failure situation?如果我上面描述的检查点/保存点没有保存/覆盖“处理进度”,我如何确保到达我的管道的每条消息都永远不会重新处理任何给定的运算符(平面图 1/2/3)重启/失败情况?

When a checkpoint is taken, every task (parallel instance of an operator) checkpoints its state.当采取检查点时,每个任务(操作员的并行实例)都会检查其状态。 In your example, the three flatmap operators are stateless, so there is no state to be checkpointed.在您的示例中,三个 flatmap 运算符是无状态的,因此没有要检查点的状态。 The Kafka source is stateful and checkpoints the reading offsets for all partitions. Kafka 源是有状态的,并检查所有分区的读取偏移量。

In case of a failure, the job is recovered and all tasks load their state which means in case of the source operator that the reading offsets are reset.在失败的情况下,作业被恢复并且所有任务都加载它们的状态,这意味着在源操作员的情况下读取偏移被重置。 Hence, the application will reprocess all events since the last checkpoint.因此,应用程序将重新处理自上次检查点以来的所有事件。

In order to achieve end-to-end exactly-once, you need a special sink connector that offers either transaction support (eg, for Kafka) or supports idempotent writes.为了实现端到端的恰好一次,您需要一个特殊的接收器连接器,它提供事务支持(例如,用于 Kafka)或支持幂等写入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM