简体   繁体   中英

airflow xcom.pull() access to a implicitly returned value of upstream task

I am new to Python and new to Airflow.

I am using the Snowflake database.

I have created an operator SnowflakeGetDataOperator that returns the snowflake hook.get_records method (i am returning a small amount of kines - usually a single cell)

so now I have this task in the dag:

check_last_run_date=SnowflakeGetDataOperator(
    task_id='check_last_run_date',
    sql="SELECT COALESCE (max(update_date), '2000-01-01') FROM poc.dwh.fact_collector",
    snowflake_conn_id='snowflake_default',
    dag=dag)

when this task runs I see in the Airfow backend the xcom object of this task (the returned value of the operator - i did not use xcom.push() )

My question is how do I access this value from the next downstream task?

I need to use it as a parameter for my next sql operator.

I have tried the following line within the dag code

{{ task_instance.xcom_pull(task_ids='check_last_run_date') }}

but the code doesn't recognize the task_instance attribute.

EDIT

The next task should be something like

fill_agg_table = SnowflakeOperator( 
task_id='fill_cust_agg_data', 
sql= str.replace ("""INSERT INTO oc.TEMP_COMPUTING.collector_customer_aggregative_data 
  ( SELECT * FROM POC.STG."stg_atg_data" WHERE XXXXX < current_date)""", 
    'XXXXX', 
    {{ task_instance.xcom_pull(task_ids='check_last_run_date') }}, 
snowflake_conn_id='snowflake_default', 
dag=dag )) 

Late but your title is the answer :

xcom_pull() with no args will return the latest return_value for the dagrun, hence the value pushed by the immediate upstream task, assuming only one task.

It is not explicit in the documentation but I like that better than hard-coding a task name.

Your second task looks a bit unusual. If fields are templated, you can simply put a field into the string.

In fact, using string.replace or string.format will mess up your macros and not work very well in Airflow. Other macros are here: https://airflow.apache.org/code.html#macros

Make sure that you template the sql field in your own operator. How to do this see this example code https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/bigquery_operator.py and check the variable templated_fields .

Suggestion:

sql= """INSERT INTO oc.TEMP_COMPUTING.collector_customer_aggregative_data 
  ( SELECT * FROM POC.STG."stg_atg_data" WHERE {{ task_instance.xcom_pull(task_ids='check_last_run_date') }} < current_date)""", 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM