简体   繁体   中英

How to use Airflow to restart a failed structured streaming spark job?

I need to run a structured streaming spark job in AWS EMR. As the resilience requirement, if the spark job failed due to some reasons, we hope the spark job can be recreated in EMR. It is similar as the task orchestration in ECS, which can restart the task if health check is failed. However, EMR is more a compute engine instead of orchestration system.

I am looking for some big data workflow orchestration tool, such as Airflow. However, it can not support the cycle in DAG. How can I implement some functions as below?

step_adder (EmrAddStepsOperator) >> step_checker (EmrStepSensor) >> step_adder (EmrAddStepsOperator).

What is the suggested way to improve such job level resilience? Any comments are welcome!

Some of the resilience are already cover by Apache Spark (jobs submitted with spark-submit), however when then you want to interact with different processes, that are not withing Spark, then Airflow might be a solution. In your case, a Sensor can help detect if a certain condition happened or not. Based on that you can decide in the DAG. Here is a simple HttpSensor that waits for a batch job to see if it's successfully finished

wait_batch_to_finish = HttpSensor(
    http_conn_id='spark_web',
    task_id="wait_batch_to_finish",
    method="GET",
    headers={"Content-Type": "application/json"},
    endpoint="/json",
    response_check=lambda response: check_spark_status(response, "{{ ti.xcom_pull('batch_intel_task')}}"),
    poke_interval=60,
    dag=dag
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM