[英]airflow xcom.pull() access to a implicitly returned value of upstream task
I am new to Python and new to Airflow.我是 Python 新手,也是 Airflow 新手。
I am using the Snowflake database.我正在使用雪花数据库。
I have created an operator SnowflakeGetDataOperator
that returns the snowflake hook.get_records
method (i am returning a small amount of kines - usually a single cell)我创建了一个操作符
SnowflakeGetDataOperator
,它返回雪花hook.get_records
方法(我正在返回少量的 kines - 通常是单个单元格)
so now I have this task in the dag:所以现在我在 dag 中有这个任务:
check_last_run_date=SnowflakeGetDataOperator(
task_id='check_last_run_date',
sql="SELECT COALESCE (max(update_date), '2000-01-01') FROM poc.dwh.fact_collector",
snowflake_conn_id='snowflake_default',
dag=dag)
when this task runs I see in the Airfow backend the xcom object of this task (the returned value of the operator - i did not use xcom.push()
)当此任务运行时,我在 Airfow 后端看到此任务的 xcom 对象(运算符的返回值 - 我没有使用
xcom.push()
)
My question is how do I access this value from the next downstream task?我的问题是如何从下一个下游任务访问这个值?
I need to use it as a parameter for my next sql operator.我需要将它用作下一个 sql 运算符的参数。
I have tried the following line within the dag code我在 dag 代码中尝试了以下行
{{ task_instance.xcom_pull(task_ids='check_last_run_date') }}
but the code doesn't recognize the task_instance attribute.但代码无法识别 task_instance 属性。
EDIT编辑
The next task should be something like下一个任务应该是这样的
fill_agg_table = SnowflakeOperator(
task_id='fill_cust_agg_data',
sql= str.replace ("""INSERT INTO oc.TEMP_COMPUTING.collector_customer_aggregative_data
( SELECT * FROM POC.STG."stg_atg_data" WHERE XXXXX < current_date)""",
'XXXXX',
{{ task_instance.xcom_pull(task_ids='check_last_run_date') }},
snowflake_conn_id='snowflake_default',
dag=dag ))
Late but your title is the answer :迟到了,但你的标题就是答案:
xcom_pull()
with no args will return the latest return_value
for the dagrun, hence the value pushed by the immediate upstream task, assuming only one task.没有 args 的
xcom_pull()
将返回 dagrun 的最新return_value
,因此是由直接上游任务推送的值,假设只有一个任务。
It is not explicit in the documentation but I like that better than hard-coding a task name.它在文档中没有明确说明,但我更喜欢它而不是硬编码任务名称。
Your second task looks a bit unusual.你的第二个任务看起来有点不寻常。 If fields are templated, you can simply put a field into the string.
如果字段是模板化的,您可以简单地将字段放入字符串中。
In fact, using string.replace
or string.format
will mess up your macros and not work very well in Airflow.事实上,使用
string.replace
或string.format
会弄乱你的宏并且在 Airflow 中不能很好地工作。 Other macros are here: https://airflow.apache.org/code.html#macros其他宏在这里: https : //airflow.apache.org/code.html#macros
Make sure that you template the sql field in your own operator.确保您在自己的运算符中对 sql 字段进行了模板化。 How to do this see this example code https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/bigquery_operator.py and check the variable
templated_fields
.如何做到这一点,请参阅此示例代码https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/bigquery_operator.py并检查变量
templated_fields
。
Suggestion:建议:
sql= """INSERT INTO oc.TEMP_COMPUTING.collector_customer_aggregative_data
( SELECT * FROM POC.STG."stg_atg_data" WHERE {{ task_instance.xcom_pull(task_ids='check_last_run_date') }} < current_date)""",
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.