简体   繁体   中英

How to safely update jobs in-flight using Apache Flink on AWS EMR?

I was not able to find instructions for how to update code safely. I see Flink docs on how to use savepoints. I'd expect an easy solution for updating Flink jobs in AWS EMR.

https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/aws.html

https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/upgrading.html

https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html

I was expecting instructions like the following (but not for Dataflow and Apache Beam):

https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline

https://medium.com/google-cloud/restarting-cloud-dataflow-in-flight-9c688c49adfd

To achieve that You need to cancel Your job with savepoint whether by using Flink command line interface or via the REST API . In both cases, You will receive the path for the savepoint(in case of REST API you will receive the request-id since cancel is an async operation, but You can use that to retrieve savepoint path).

After getting the savepoint path, You will be able to start a new job, again both via REST API or CLI, you will be able to provide the path to the savepoint when starting the job so that Flink will automatically restore the state from Savepoint, including all records that were in-flight.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM