[英]Passing a list on a Python Callable in Airflow
I have a dag that uses a list of CSV files then creates them into a data frame for imports.我有一个使用 CSV 文件列表的 dag,然后将它们创建到数据框中以进行导入。
#CREATING CSV FILES
def csv_filess():
print("Creating CSV files....")
csv_files = []
for file in os.listdir(dataset_dir):
if file.endswith('.csv'):
csv_files.append(file)
print("***Step 3: CSV files created***")
return csv_files
#CREATING DATAFRAME
def create_df(csv_files):
print("Creating dataframe....")
df = {}
for file in csv_files:
try:
df[file] = pd.read_csv(data_path+file)
except UnicodeDecodeError:
df[file] = pd.read_csv(dataset_dir+file, encoding="ISO-8859-1")
print("***Step 4: CSV files created in df!***")
return df
t3 = PythonOperator(
task_id='create_csv',
python_callable=csv_filess, provide_context=True,
dag=dag)
t4 = PythonOperator(
task_id='create_df',
python_callable=create_df,
op_args = t3.output,
provide_context=True,
dag=dag)
But I get an error:但我收到一个错误:
create_df() takes 1 positional argument but 4 were given
create_df() 接受 1 个位置参数,但给出了 4 个
I think it's because I have to put it this way first?:我想是因为我必须先这样说吗?:
csv_files = csv_filess()
csv_files = csv_filess()
But how to define that on an Airflow task?但是如何在 Airflow 任务上定义它呢?
Returning a value from a PythonOperator automatically stores the output as an XCom with key "return_value".从 PythonOperator 返回值会自动将输出存储为带有键“return_value”的 XCom。 So you'll get an XCom from task
create_csv<\/code> with key
return_value<\/code> and value
["file1.csv", "file2.csv", ...]<\/code> .
因此,您将从任务
create_csv<\/code>中获得一个 XCom,其键为
return_value<\/code>和 value
["file1.csv", "file2.csv", ...]<\/code> 。
You can inspect all XComs in Airflow under Admin -> XComs, or per task by clicking a task -> Instance Details -> XCom.
您可以在 Admin -> XComs 下检查 Airflow 中的所有 XCom,或者通过单击任务 -> 实例详细信息 -> XCom 来检查每个任务。
In your
create_df<\/code> task, you then pass the output of
create_csv<\/code> using
t3.output<\/code> .
在您的
create_df<\/code>任务中,您然后使用
t3.output<\/code>传递
create_csv<\/code>的输出。
This is a reference to the previously created XCom.
这是对先前创建的 XCom 的引用。 When given a list to
op_args<\/code> , Airflow automatically unpacks the list.
当给
op_args<\/code>一个列表时,Airflow 会自动解包该列表。
So you'll have to accept multiple arguments with a
*<\/code> to do the trick:
所以你必须接受带有
*<\/code>的多个参数才能做到这一点:
def create_df(*csv_files):
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.