简体   繁体   English

使用 xcom_pull 拉取相同任务推送的键值 - Airflow

[英]use xcom_pull to pull a key's value that same task pushed - Airflow

I've built an Airflow Operator that executes HTTP request against a cloud API.我已经构建了一个 Airflow 运算符,它针对云 API 执行 HTTP 请求。 After executing a request I get a run_id that identifies the execution, and then keep checking my request status until it finish (it can take some time).执行请求后,我得到一个标识执行的run_id ,然后继续检查我的请求状态直到完成(可能需要一些时间)。

I'm trying to develop a mechanism that will save this run_id to xCom, for cases of network issues/ spot taken, when my pod loose connection with the cloud service.我正在尝试开发一种机制,当我的 pod 与云服务的连接松动时,将这个run_id保存到 xCom,以应对网络问题/现场拍摄的情况。 I'm using Airflow retries, so I want to resume the same connection after the failure and keep checking the status of the run_id that saved to xCom.我正在使用 Airflow 重试,所以我想在失败后恢复相同的连接并继续检查保存到 xCom 的run_id的状态。

I was able to push the run_id to xCom, but I when I pull it for the same task I get None instead of the value that I pushed and see in the Admin dashboard (I could pull it for other tasks in the pipeline).我能够将run_id推送到 xCom,但是当我为同一任务拉取它时,我得到None而不是我推送并在管理仪表板中看到的值(我可以将它拉到管道中的其他任务)。

  • Is there any limitation for getting xCom pushed value that was pushed from the same task (in earlier try) that I'm trying to pull from?从我试图从中提取的同一任务(在早期尝试中)推送的 xCom 推送值是否有任何限制?
  • I have also an option to use airflow Variables instead of xCom, but it looks like it's not relevant for my use-case but for sharing data between dags (and in my use case I want to share data between task to itself in future run).我还可以选择使用 airflow 变量而不是 xCom,但它看起来与我的用例无关,而是用于在 dag 之间共享数据(在我的用例中,我想在未来运行时在任务之间共享数据) .

I execute this code inside the execute function of my operator: This is how I pushed the value:我在操作员的execute function 中执行此代码:这就是我推送值的方式:

self.xcom_push(context, key='run_id', value='111')

This is how I pulled the values:这就是我提取值的方式:

value = self.xcom_pull(context, key='run_id', task_ids=self.task_id, include_prior_dates=True)

but value is always None (unless I try to pull data of other task (but it's not relevant for my use case).但值始终为None (除非我尝试提取其他任务的数据(但这与我的用例无关)。

I used airflow Variables instead of xCom, since it's not possible to get xCom pushed value that was pushed from the same task (in an earlier try).我使用 airflow 变量而不是 xCom,因为无法获得从同一任务推送的 xCom 推送值(在较早的尝试中)。

Supposing that you have a folder shared between all workers, an alternative solution is using a file to store whatever you need to pass between task runs, including multiple runs of the same one.假设您在所有工作人员之间共享一个文件夹,另一种解决方案是使用文件来存储您需要在任务运行之间传递的任何内容,包括同一任务的多次运行。 This solution is a good replacement of XCOM overall, which is usually a more expensive operation (DB) than simple file operations.这个解决方案是 XCOM 整体的一个很好的替代品,XCOM 通常是比简单的文件操作更昂贵的操作 (DB)。 Just create a subfolder from your DAG run id and use that across all the task of your DAG.只需从 DAG 运行 ID 创建一个子文件夹,然后在 DAG 的所有任务中使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM