简体   繁体   English

如何使用 Airflow 重新启动失败的结构化流式 Spark 作业?

[英]How to use Airflow to restart a failed structured streaming spark job?

I need to run a structured streaming spark job in AWS EMR.我需要在 AWS EMR 中运行结构化流式 Spark 作业。 As the resilience requirement, if the spark job failed due to some reasons, we hope the spark job can be recreated in EMR.作为弹性要求,如果 Spark 作业由于某些原因失败,我们希望可以在 EMR 中重新创建 Spark 作业。 It is similar as the task orchestration in ECS, which can restart the task if health check is failed.类似于 ECS 中的任务编排,如果健康检查失败,可以重新启动任务。 However, EMR is more a compute engine instead of orchestration system.然而,EMR 更像是一个计算引擎而不是编排系统。

I am looking for some big data workflow orchestration tool, such as Airflow.我正在寻找一些大数据工作流编排工具,例如Airflow。 However, it can not support the cycle in DAG.但是,它不能支持 DAG 中的循环。 How can I implement some functions as below?如何实现以下一些功能?

step_adder (EmrAddStepsOperator) >> step_checker (EmrStepSensor) >> step_adder (EmrAddStepsOperator). step_adder (EmrAddStepsOperator) >> step_checker (EmrStepSensor) >> step_adder (EmrAddStepsOperator)。

What is the suggested way to improve such job level resilience?提高这种工作水平弹性的建议方法是什么? Any comments are welcome!欢迎任何意见!

Some of the resilience are already cover by Apache Spark (jobs submitted with spark-submit), however when then you want to interact with different processes, that are not withing Spark, then Airflow might be a solution. Apache Spark(使用 spark-submit 提交的作业)已经涵盖了一些弹性,但是当您想要与不使用 Spark 的不同进程交互时,Airflow 可能是一个解决方案。 In your case, a Sensor can help detect if a certain condition happened or not.在您的情况下, Sensor可以帮助检测是否发生了某种情况。 Based on that you can decide in the DAG.基于此,您可以在 DAG 中做出决定。 Here is a simple HttpSensor that waits for a batch job to see if it's successfully finished这是一个简单的HttpSensor等待批处理作业,看看它是否成功完成

wait_batch_to_finish = HttpSensor(
    http_conn_id='spark_web',
    task_id="wait_batch_to_finish",
    method="GET",
    headers={"Content-Type": "application/json"},
    endpoint="/json",
    response_check=lambda response: check_spark_status(response, "{{ ti.xcom_pull('batch_intel_task')}}"),
    poke_interval=60,
    dag=dag
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 airflow 火花作业失败 - airflow spark job failed 如何自动重启Spark Streaming中发生故障的节点? - How to automatically restart a failed node in Spark Streaming? 我可以使用 Airflow 来启动/停止 Spark 流作业吗 - Can I use Airflow to start/stop spark streaming job 如何在特定时间内运行 Spark 结构化流作业? - How can I run a Spark structured streaming job for a certain time? 如何在 spark 3.0 结构化流媒体中使用 kafka.group.id 和检查点以继续从 Kafka 中读取它在重启后停止的位置? - How to use kafka.group.id and checkpoints in spark 3.0 structured streaming to continue to read from Kafka where it left off after restart? 将 Spark 结构化流与 StreamingKMeans 结合使用 - Use Spark structured streaming with StreamingKMeans 增加 Spark Structured Streaming 作业的输出大小 - Increase the output size of Spark Structured Streaming job 如何使用完全形成的SQL与spark结构化流 - How to use fully formed SQL with spark structured streaming 如何在Kafka Direct Stream中使用Spark Structured Streaming? - How to use Spark Structured Streaming with Kafka Direct Stream? 如何在 Spark 结构化流中使用 UDF(用户定义函数)? - How to use UDF(user defined function) on spark structured streaming?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM