简体   繁体   English

气流 xcom.pull() 访问上游任务的隐式返回值

[英]airflow xcom.pull() access to a implicitly returned value of upstream task

I am new to Python and new to Airflow.我是 Python 新手,也是 Airflow 新手。

I am using the Snowflake database.我正在使用雪花数据库。

I have created an operator SnowflakeGetDataOperator that returns the snowflake hook.get_records method (i am returning a small amount of kines - usually a single cell)我创建了一个操作符SnowflakeGetDataOperator ,它返回雪花hook.get_records方法(我正在返回少量的 kines - 通常是单个单元格)

so now I have this task in the dag:所以现在我在 dag 中有这个任务:

check_last_run_date=SnowflakeGetDataOperator(
    task_id='check_last_run_date',
    sql="SELECT COALESCE (max(update_date), '2000-01-01') FROM poc.dwh.fact_collector",
    snowflake_conn_id='snowflake_default',
    dag=dag)

when this task runs I see in the Airfow backend the xcom object of this task (the returned value of the operator - i did not use xcom.push() )当此任务运行时,我在 Airfow 后端看到此任务的 xcom 对象(运算符的返回值 - 我没有使用xcom.push()

My question is how do I access this value from the next downstream task?我的问题是如何从下一个下游任务访问这个值?

I need to use it as a parameter for my next sql operator.我需要将它用作下一个 sql 运算符的参数。

I have tried the following line within the dag code我在 dag 代码中尝试了以下行

{{ task_instance.xcom_pull(task_ids='check_last_run_date') }}

but the code doesn't recognize the task_instance attribute.但代码无法识别 task_instance 属性。

EDIT编辑

The next task should be something like下一个任务应该是这样的

fill_agg_table = SnowflakeOperator( 
task_id='fill_cust_agg_data', 
sql= str.replace ("""INSERT INTO oc.TEMP_COMPUTING.collector_customer_aggregative_data 
  ( SELECT * FROM POC.STG."stg_atg_data" WHERE XXXXX < current_date)""", 
    'XXXXX', 
    {{ task_instance.xcom_pull(task_ids='check_last_run_date') }}, 
snowflake_conn_id='snowflake_default', 
dag=dag )) 

Late but your title is the answer :迟到了,但你的标题就是答案:

xcom_pull() with no args will return the latest return_value for the dagrun, hence the value pushed by the immediate upstream task, assuming only one task.没有 args 的xcom_pull()将返回 dagrun 的最新return_value ,因此是由直接上游任务推送的值,假设只有一个任务。

It is not explicit in the documentation but I like that better than hard-coding a task name.它在文档中没有明确说明,但我更喜欢它而不是硬编码任务名称。

Your second task looks a bit unusual.你的第二个任务看起来有点不寻常。 If fields are templated, you can simply put a field into the string.如果字段是模板化的,您可以简单地将字段放入字符串中。

In fact, using string.replace or string.format will mess up your macros and not work very well in Airflow.事实上,使用string.replacestring.format会弄乱你的宏并且在 Airflow 中不能很好地工作。 Other macros are here: https://airflow.apache.org/code.html#macros其他宏在这里: https : //airflow.apache.org/code.html#macros

Make sure that you template the sql field in your own operator.确保您在自己的运算符中对 sql 字段进行了模板化。 How to do this see this example code https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/bigquery_operator.py and check the variable templated_fields .如何做到这一点,请参阅此示例代码https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/bigquery_operator.py并检查变量templated_fields

Suggestion:建议:

sql= """INSERT INTO oc.TEMP_COMPUTING.collector_customer_aggregative_data 
  ( SELECT * FROM POC.STG."stg_atg_data" WHERE {{ task_instance.xcom_pull(task_ids='check_last_run_date') }} < current_date)""", 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM