[英]How to run Airflow tasks synchronously
I have an airflow comprising of 2-3 steps我有一个由 2-3 个步骤组成的气流
What happens here is the airflow gets completed within seconds even if the Athena query step is running.这里发生的是,即使 Athena 查询步骤正在运行,气流也会在几秒钟内完成。
I want to make sure that after the file is generated further steps should run.我想确保在生成文件后应该运行进一步的步骤。 Basically i want this to be synchronous.
基本上我希望这是同步的。
You can set the tasks as:您可以将任务设置为:
def athena_task():
# Add your code
return
t1 = PythonOperator(
task_id='athena_task',
python_callable=athena_task,
)
t2 = BashOperator(
task_id='variable_task',
bash_command='', #replace with relevant command
)
t3 = BashOperator(
task_id='process_task',
bash_command='', #replace with relevant command
)
t1 >> t2 >> t3
t2 will run only after t1 is completed successfully and t3 will start only after t2 is completed successfully. t2 仅在 t1 成功完成后才会运行,t3 仅在 t2 成功完成后才会启动。
Note that Airflow has AWSAthenaOperator which might save you the trouble of writing the code yourself.请注意,Airflow 具有AWSAthenaOperator ,这可能会为您省去自己编写代码的麻烦。 The operator submit a query to Athena and save the output in S3 path by setting the
output_location
parameter:操作员向 Athena 提交查询,并通过设置
output_location
参数将输出保存在 S3 路径中:
run_query = AWSAthenaOperator(
task_id='athena_task',
query='SELECT * FROM my_table',
output_location='s3://some-bucket/some-path/',
database='my_database'
)
Athena's query API is asynchronous. Athena 的查询 API 是异步的。 You start a query, get an ID back, and then you need to poll until the query has completed using the
GetQueryExecution
API call.您开始查询,取回 ID,然后您需要使用
GetQueryExecution
API 调用进行轮询,直到查询完成。
If you only start the query in the first task then there is not guarantee that the query has completed when the next task runs.如果您仅在第一个任务中启动查询,则无法保证在下一个任务运行时查询已完成。 Only when
GetQueryExecution
has returned a status of SUCCEEDED
(or FAILED
/ CANCELLED
) can you expect the output file to exist.只有当
GetQueryExecution
返回SUCCEEDED
(或FAILED
/ CANCELLED
)状态时,您才能期望输出文件存在。
As @Elad points out, AWSAthenaOperator
does this for you, and handles error cases, and more.正如@Elad 指出的那样,
AWSAthenaOperator
会为您执行此操作,并处理错误情况等等。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.