[英]how to make airflow dag wait VM to finishes its job before doing the next task
High level description of my workflow我的工作流程的高级描述
What my vm does -- get data from gcs, process data, save the processed data to gcs我的虚拟机做了什么——从 gcs 获取数据,处理数据,将处理后的数据保存到 gcs
What my dag currently does -- start the vm >> stop the vm >> do the rest of data transformation job.我的 dag 目前所做的 - 启动 vm >> 停止 vm >> 执行 rest 的数据转换工作。
When I run the above dag, it starts the vm and stops the vm right after that.当我运行上面的 dag 时,它会启动 vm 并在此之后立即停止 vm。 I would want to make my dag to wait for the vm to finish its job.
我想让我的 dag 等待 vm 完成它的工作。
Note: kubernates/cloud run is not an option for me.注意:kubernates/cloud run 不适合我。
If you know the output file name which your VM will write to GCS, you can add a sensor between start_vm
and stop_vm
, which check if the output file is created or not every X seconds, once it's created, the sensor is marked as succeeded
and the task stop_vm
is started:如果您知道您的 VM 将写入 GCS 的 output 文件名,您可以在
start_vm
和stop_vm
之间添加一个传感器,检查 output 文件是否每 X 秒创建一次,一旦创建,传感器将标记为succeeded
并且任务stop_vm
启动:
gcs_output_sensor = GCSObjectExistenceSensor(
bucket="bucket_name",
object="path/to/file"
)
start_the_vm >> gcs_output_sensor >> stop_the_vm >> do_the_rest_of_data_transformation_job
If not, you can instead of running the script which process the data automatically when the VM is up, you can run the VM and use the SSHOperator
to run the script, in this case the task will wait the script, and the other task will wait it.如果没有,你可以代替运行虚拟机启动时自动处理数据的脚本,你可以运行虚拟机并使用
SSHOperator
运行脚本,在这种情况下任务将等待脚本,其他任务将等等。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.