I am new to Python and new to Airflow.
I am using the Snowflake database.
I have created an operator SnowflakeGetDataOperator
that returns the snowflake hook.get_records
method (i am returning a small amount of kines - usually a single cell)
so now I have this task in the dag:
check_last_run_date=SnowflakeGetDataOperator(
task_id='check_last_run_date',
sql="SELECT COALESCE (max(update_date), '2000-01-01') FROM poc.dwh.fact_collector",
snowflake_conn_id='snowflake_default',
dag=dag)
when this task runs I see in the Airfow backend the xcom object of this task (the returned value of the operator - i did not use xcom.push()
)
My question is how do I access this value from the next downstream task?
I need to use it as a parameter for my next sql operator.
I have tried the following line within the dag code
{{ task_instance.xcom_pull(task_ids='check_last_run_date') }}
but the code doesn't recognize the task_instance attribute.
EDIT
The next task should be something like
fill_agg_table = SnowflakeOperator(
task_id='fill_cust_agg_data',
sql= str.replace ("""INSERT INTO oc.TEMP_COMPUTING.collector_customer_aggregative_data
( SELECT * FROM POC.STG."stg_atg_data" WHERE XXXXX < current_date)""",
'XXXXX',
{{ task_instance.xcom_pull(task_ids='check_last_run_date') }},
snowflake_conn_id='snowflake_default',
dag=dag ))
Late but your title is the answer :
xcom_pull()
with no args will return the latest return_value
for the dagrun, hence the value pushed by the immediate upstream task, assuming only one task.
It is not explicit in the documentation but I like that better than hard-coding a task name.
Your second task looks a bit unusual. If fields are templated, you can simply put a field into the string.
In fact, using string.replace
or string.format
will mess up your macros and not work very well in Airflow. Other macros are here: https://airflow.apache.org/code.html#macros
Make sure that you template the sql field in your own operator. How to do this see this example code https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/bigquery_operator.py and check the variable templated_fields
.
Suggestion:
sql= """INSERT INTO oc.TEMP_COMPUTING.collector_customer_aggregative_data
( SELECT * FROM POC.STG."stg_atg_data" WHERE {{ task_instance.xcom_pull(task_ids='check_last_run_date') }} < current_date)""",
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.