简体   繁体   English

我可以在 Airflow 脚本(在 PythonOperator 之外)的 MAIN 部分中获取()或 xcom.pull()变量吗?

[英]Can I get() or xcom.pull() a variable in the MAIN part of an Airflow script (outside a PythonOperator)?

I have a situation where I need to find a specific folder in S3 to pass onto a PythonOperator in an Airflow script.我有一种情况,我需要在 S3 中找到一个特定文件夹以传递给 Airflow 脚本中的 PythonOperator。 I am doing this using another PythonOperator that finds the correct directory.我正在使用另一个找到正确目录的 PythonOperator 来执行此操作。 I can successfully either xcom.push() or Variable.set() and read it back within the PythonOperator.我可以成功地使用 xcom.push() 或 Variable.set() 并PythonOperator 中读回它。 The problem is, I need to pass this variable onto a separate PythonOperator that uses code in a python library.问题是,我需要将此变量传递给一个单独的 PythonOperator,该 PythonOperator 使用 python 库中的代码。 Therefore, I need to Variable.get() or xcom.pull() this variable within the main part of the Airflow script.因此,我需要在 Airflow 脚本的主要部分中使用 Variable.get() 或 xcom.pull() 这个变量。 I have searched quite a bit and can't seem to figure out if this is possible or not.我已经搜索了很多,似乎无法弄清楚这是否可能。 Below is some code for reference:下面是一些代码供参考:

    def check_for_done_file(**kwargs):

    ### This function does a bunch of stuff to find the correct S3 path to 
    ### populate target_dir, this has been verified and works

    Variable.set("target_dir", done_file_list.pop())
    test = Variable.get("target_dir")
    print("TEST: ", test)

    #### END OF METHOD, BEGIN MAIN

with my_dag:

   ### CALLING METHOD FROM MAIN, POPULATING VARIABLE

   check_for_done_file_task = PythonOperator(
      task_id = 'check_for_done_file',
      python_callable = check_for_done_file,
      dag = my_dag,
      op_kwargs = {
          "source_bucket" : "my_source_bucket",
          "source_path" : "path/to/the/s3/folder/I/need"
      }
   )

   target_dir = Variable.get("target_dir") # I NEED THIS VAR HERE.

   move_data_to_in_progress_task = PythonOperator(
       task_id = 'move-from-incoming-to-in-progress',
       python_callable = FileOps.move, # <--- PYTHON LIBRARY THAT COPIES FILES FROM SRC TO DEST
       dag = my_dag,
       op_kwargs = {
           "source_bucket" : "source_bucket",
           "source_path" : "path/to/my/s3/folder/" + target_dir,
           "destination_bucket" : "destination_bucket",
           "destination_path" : "path/to/my/s3/folder/" + target_dir,
           "recurse" : True
       }
    )

So, is the only way to accomplish this to augment the library to look for the "target_dir" variable?那么,完成此操作以增加库以查找“target_dir”变量的唯一方法是什么? I don't think Airflow main has a context, and therefore what I want to do may not be possible.我不认为 Airflow main 有上下文,因此我想做的事情可能是不可能的。 Any Airflow experts, please weigh in to let me know what my options might be.任何 Airflow 专家,请权衡一下,让我知道我的选择可能是什么。

op_kwargs is a templated field. op_kwargs是一个模板化的字段。 So you can use xcom_push :所以你可以使用xcom_push

def check_for_done_file(**kwargs):
    ...
    kwargs['ti'].xcom_push(value=y)

and use jinja template in op_kwargs :并在op_kwargs中使用 jinja 模板:

   move_data_to_in_progress_task = PythonOperator(
       task_id = 'move-from-incoming-to-in-progress',
       python_callable = FileOps.move, # <--- PYTHON LIBRARY THAT COPIES FILES FROM SRC TO DEST
       dag = my_dag,
       op_kwargs = {
           "source_bucket" : "source_bucket",
           "source_path" : "path/to/my/s3/folder/{{ ti.xcom_pull(task_ids='check_for_done_file') }}",
           "destination_bucket" : "destination_bucket",
           "destination_path" : "path/to/my/s3/folder/{{ ti.xcom_pull(task_ids='check_for_done_file') }}",
           "recurse" : True
       }
    )

Also, add provide_context=True to your check_for_done_file_task task to pass context dictionary to callables.此外,将provide_context=True添加到您的check_for_done_file_task任务中,以将上下文字典传递给可调用对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM