简体   繁体   English

"在 Airflow 中传递 Python 可调用列表"

[英]Passing a list on a Python Callable in Airflow

I have a dag that uses a list of CSV files then creates them into a data frame for imports.我有一个使用 CSV 文件列表的 dag,然后将它们创建到数据框中以进行导入。

#CREATING CSV FILES
def csv_filess():
    print("Creating CSV files....")
    csv_files = []
    for file in os.listdir(dataset_dir):
        if file.endswith('.csv'):
            csv_files.append(file)
    print("***Step 3: CSV files created***")
    return csv_files

#CREATING DATAFRAME
def create_df(csv_files):      
    print("Creating dataframe....")     
    df = {}
    for file in csv_files:
        try:
            df[file] = pd.read_csv(data_path+file)
        except UnicodeDecodeError:
            df[file] = pd.read_csv(dataset_dir+file, encoding="ISO-8859-1")
    
    print("***Step 4: CSV files created in df!***")
    return df


t3 = PythonOperator(
    task_id='create_csv',
    python_callable=csv_filess, provide_context=True,
    dag=dag)

t4 = PythonOperator(
    task_id='create_df',
    python_callable=create_df,
    op_args = t3.output,
    provide_context=True,
    dag=dag)

But I get an error:但我收到一个错误:

create_df() takes 1 positional argument but 4 were given create_df() 接受 1 个位置参数,但给出了 4 个

I think it's because I have to put it this way first?:我想是因为我必须先这样说吗?:

csv_files = csv_filess() csv_files = csv_filess()

But how to define that on an Airflow task?但是如何在 Airflow 任务上定义它呢?

Returning a value from a PythonOperator automatically stores the output as an XCom with key "return_value".从 PythonOperator 返回值会自动将输出存储为带有键“return_value”的 XCom。 So you'll get an XCom from task create_csv<\/code> with key return_value<\/code> and value ["file1.csv", "file2.csv", ...]<\/code> .因此,您将从任务create_csv<\/code>中获得一个 XCom,其键为return_value<\/code>和 value ["file1.csv", "file2.csv", ...]<\/code> 。 You can inspect all XComs in Airflow under Admin -> XComs, or per task by clicking a task -> Instance Details -> XCom.您可以在 Admin -> XComs 下检查 Airflow 中的所有 XCom,或者通过单击任务 -> 实例详细信息 -> XCom 来检查每个任务。

In your create_df<\/code> task, you then pass the output of create_csv<\/code> using t3.output<\/code> .在您的create_df<\/code>任务中,您然后使用t3.output<\/code>传递create_csv<\/code>的输出。 This is a reference to the previously created XCom.这是对先前创建的 XCom 的引用。 When given a list to op_args<\/code> , Airflow automatically unpacks the list.当给op_args<\/code>一个列表时,Airflow 会自动解包该列表。 So you'll have to accept multiple arguments with a *<\/code> to do the trick:所以你必须接受带有*<\/code>的多个参数才能做到这一点:

def create_df(*csv_files):
    ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM