[英]Can I get() or xcom.pull() a variable in the MAIN part of an Airflow script (outside a PythonOperator)?
I have a situation where I need to find a specific folder in S3 to pass onto a PythonOperator in an Airflow script.我有一种情况,我需要在 S3 中找到一个特定文件夹以传递给 Airflow 脚本中的 PythonOperator。 I am doing this using another PythonOperator that finds the correct directory.我正在使用另一个找到正确目录的 PythonOperator 来执行此操作。 I can successfully either xcom.push() or Variable.set() and read it back within the PythonOperator.我可以成功地使用 xcom.push() 或 Variable.set() 并在PythonOperator 中读回它。 The problem is, I need to pass this variable onto a separate PythonOperator that uses code in a python library.问题是,我需要将此变量传递给一个单独的 PythonOperator,该 PythonOperator 使用 python 库中的代码。 Therefore, I need to Variable.get() or xcom.pull() this variable within the main part of the Airflow script.因此,我需要在 Airflow 脚本的主要部分中使用 Variable.get() 或 xcom.pull() 这个变量。 I have searched quite a bit and can't seem to figure out if this is possible or not.我已经搜索了很多,似乎无法弄清楚这是否可能。 Below is some code for reference:下面是一些代码供参考:
def check_for_done_file(**kwargs):
### This function does a bunch of stuff to find the correct S3 path to
### populate target_dir, this has been verified and works
Variable.set("target_dir", done_file_list.pop())
test = Variable.get("target_dir")
print("TEST: ", test)
#### END OF METHOD, BEGIN MAIN
with my_dag:
### CALLING METHOD FROM MAIN, POPULATING VARIABLE
check_for_done_file_task = PythonOperator(
task_id = 'check_for_done_file',
python_callable = check_for_done_file,
dag = my_dag,
op_kwargs = {
"source_bucket" : "my_source_bucket",
"source_path" : "path/to/the/s3/folder/I/need"
}
)
target_dir = Variable.get("target_dir") # I NEED THIS VAR HERE.
move_data_to_in_progress_task = PythonOperator(
task_id = 'move-from-incoming-to-in-progress',
python_callable = FileOps.move, # <--- PYTHON LIBRARY THAT COPIES FILES FROM SRC TO DEST
dag = my_dag,
op_kwargs = {
"source_bucket" : "source_bucket",
"source_path" : "path/to/my/s3/folder/" + target_dir,
"destination_bucket" : "destination_bucket",
"destination_path" : "path/to/my/s3/folder/" + target_dir,
"recurse" : True
}
)
So, is the only way to accomplish this to augment the library to look for the "target_dir" variable?那么,完成此操作以增加库以查找“target_dir”变量的唯一方法是什么? I don't think Airflow main has a context, and therefore what I want to do may not be possible.我不认为 Airflow main 有上下文,因此我想做的事情可能是不可能的。 Any Airflow experts, please weigh in to let me know what my options might be.任何 Airflow 专家,请权衡一下,让我知道我的选择可能是什么。
op_kwargs
is a templated field. op_kwargs
是一个模板化的字段。 So you can use xcom_push
:所以你可以使用xcom_push
:
def check_for_done_file(**kwargs):
...
kwargs['ti'].xcom_push(value=y)
and use jinja template in op_kwargs
:并在op_kwargs
中使用 jinja 模板:
move_data_to_in_progress_task = PythonOperator(
task_id = 'move-from-incoming-to-in-progress',
python_callable = FileOps.move, # <--- PYTHON LIBRARY THAT COPIES FILES FROM SRC TO DEST
dag = my_dag,
op_kwargs = {
"source_bucket" : "source_bucket",
"source_path" : "path/to/my/s3/folder/{{ ti.xcom_pull(task_ids='check_for_done_file') }}",
"destination_bucket" : "destination_bucket",
"destination_path" : "path/to/my/s3/folder/{{ ti.xcom_pull(task_ids='check_for_done_file') }}",
"recurse" : True
}
)
Also, add provide_context=True
to your check_for_done_file_task
task to pass context dictionary to callables.此外,将provide_context=True
添加到您的check_for_done_file_task
任务中,以将上下文字典传递给可调用对象。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.