[英]How to use Airflow to restart a failed structured streaming spark job?
I need to run a structured streaming spark job in AWS EMR.我需要在 AWS EMR 中运行结构化流式 Spark 作业。 As the resilience requirement, if the spark job failed due to some reasons, we hope the spark job can be recreated in EMR.
作为弹性要求,如果 Spark 作业由于某些原因失败,我们希望可以在 EMR 中重新创建 Spark 作业。 It is similar as the task orchestration in ECS, which can restart the task if health check is failed.
类似于 ECS 中的任务编排,如果健康检查失败,可以重新启动任务。 However, EMR is more a compute engine instead of orchestration system.
然而,EMR 更像是一个计算引擎而不是编排系统。
I am looking for some big data workflow orchestration tool, such as Airflow.我正在寻找一些大数据工作流编排工具,例如Airflow。 However, it can not support the cycle in DAG.
但是,它不能支持 DAG 中的循环。 How can I implement some functions as below?
如何实现以下一些功能?
step_adder (EmrAddStepsOperator) >> step_checker (EmrStepSensor) >> step_adder (EmrAddStepsOperator). step_adder (EmrAddStepsOperator) >> step_checker (EmrStepSensor) >> step_adder (EmrAddStepsOperator)。
What is the suggested way to improve such job level resilience?提高这种工作水平弹性的建议方法是什么? Any comments are welcome!
欢迎任何意见!
Some of the resilience are already cover by Apache Spark (jobs submitted with spark-submit), however when then you want to interact with different processes, that are not withing Spark, then Airflow might be a solution. Apache Spark(使用 spark-submit 提交的作业)已经涵盖了一些弹性,但是当您想要与不使用 Spark 的不同进程交互时,Airflow 可能是一个解决方案。 In your case, a
Sensor
can help detect if a certain condition happened or not.在您的情况下,
Sensor
可以帮助检测是否发生了某种情况。 Based on that you can decide in the DAG.基于此,您可以在 DAG 中做出决定。 Here is a simple
HttpSensor
that waits for a batch job to see if it's successfully finished这是一个简单的
HttpSensor
等待批处理作业,看看它是否成功完成
wait_batch_to_finish = HttpSensor(
http_conn_id='spark_web',
task_id="wait_batch_to_finish",
method="GET",
headers={"Content-Type": "application/json"},
endpoint="/json",
response_check=lambda response: check_spark_status(response, "{{ ti.xcom_pull('batch_intel_task')}}"),
poke_interval=60,
dag=dag
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.