简体   繁体   English

如何让 airflow dag 等待 VM 在执行下一个任务之前完成其工作

[英]how to make airflow dag wait VM to finishes its job before doing the next task

High level description of my workflow我的工作流程的高级描述

What my vm does -- get data from gcs, process data, save the processed data to gcs我的虚拟机做了什么——从 gcs 获取数据,处理数据,将处理后的数据保存到 gcs

What my dag currently does -- start the vm >> stop the vm >> do the rest of data transformation job.我的 dag 目前所做的 - 启动 vm >> 停止 vm >> 执行 rest 的数据转换工作。

When I run the above dag, it starts the vm and stops the vm right after that.当我运行上面的 dag 时,它会启动 vm 并在此之后立即停止 vm。 I would want to make my dag to wait for the vm to finish its job.我想让我的 dag 等待 vm 完成它的工作。

Note: kubernates/cloud run is not an option for me.注意:kubernates/cloud run 不适合我。

If you know the output file name which your VM will write to GCS, you can add a sensor between start_vm and stop_vm , which check if the output file is created or not every X seconds, once it's created, the sensor is marked as succeeded and the task stop_vm is started:如果您知道您的 VM 将写入 GCS 的 output 文件名,您可以在start_vmstop_vm之间添加一个传感器,检查 output 文件是否每 X 秒创建一次,一旦创建,传感器将标记为succeeded并且任务stop_vm启动:

gcs_output_sensor = GCSObjectExistenceSensor(
    bucket="bucket_name",
    object="path/to/file"
)
start_the_vm >> gcs_output_sensor >> stop_the_vm >> do_the_rest_of_data_transformation_job

If not, you can instead of running the script which process the data automatically when the VM is up, you can run the VM and use the SSHOperator to run the script, in this case the task will wait the script, and the other task will wait it.如果没有,你可以代替运行虚拟机启动时自动处理数据的脚本,你可以运行虚拟机并使用SSHOperator运行脚本,在这种情况下任务将等待脚本,其他任务将等等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用传递的输入 json 配置的值来触发 dag 内的 airflow 作业? - How to use the values of input json configurations passed to trigger the airflow job inside the dag? Composer Airflow - 交叉 DAG 任务依赖 - Composer Airflow - Cross DAG Task Dependancy 如何在 airflow DAG 中的 secondary_worker_config 中创建 SPOT VM 以使用谷歌云数据处理运营商? - How to create SPOT VM's in my secondary_worker_config in airflow DAG for using google cloud dataproc operators? 我如何让 usestate 在更新其值之前等待响应 - how do i make usestate wait for a response before updating its value Cloud Run Job 完成所有任务后,如何触发 Cloud Run Job 或 Cloud Function? - How I can trigger a Cloud Run Job or Cloud Function after a Cloud Run Job finishes all its tasks? 有效地为 airflow 中的 DAG 中的所有任务设置 task_concurrency - Efficiently set task_concurrency for all tasks in a DAG in airflow 如何在执行任务之前等待多个 firebase 查询完成? - How to wait for multiple firebase queries to finish before executing a task? Airflow CLI:如何在 Airflow 1.10.12 中获取 dag 任务的状态? - Airflow CLI: How to get status of dag tasks in Airflow 1.10.12? 如何在 AWS MWAA Airflow DAG 上获取 pyodbc 连接? - How to get pyodbc connection on AWS MWAA Airflow DAG? 如何在 Big Query 表中显示 Airflow DAG 状态 - How to display Airflow DAG status in Big Query tables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM