简体   繁体   English

Airflow如何从python运算符创建数据流作业?

[英]How Airflow can create a dataflow job from a python operator?

When I am running my Beam pipeline by the command line, using the direct runner or the dataflow runner, it works fine... 当我使用直接运行程序或数据流运行程序通过命令行运行Beam管道时,它可以正常工作...

Example: 例:

$ python my_pipeline.py --key /path/to/gcp/service/key.json --project gcp_project_name

But when I am trying to use airflow, I have two options, bash operator or python operator. 但是,当我尝试使用气流时,我有两个选择,bash运算符或python运算符。

Using bash operator will success, but will limit my ability to use the airflow features. 使用bash运算符会成功,但会限制我使用气流功能的能力。

But what I am trying to do is to run it as a python operator. 但是我想做的是将其作为python运算符运行。 So I am importing the module inside the airflow dg file, and then run it as a python operator. 所以我将模块导入airflow dg文件中,然后以python运算符运行它。

It is also working fine if I am using the local runner, but when I changed it to the data flow runner, it fails after creating the job on GCP dataflow with this error 如果我使用本地运行器,它也可以正常工作,但是当我将其更改为数据流运行器时,在GCP数据流上创建作业后此错误失败,并显示此错误

ImportError: No module named airflow.bin.cli

What I am missing to let Airflow create a dataflow job from a python operator? 我想让Airflowpython运算符创建数据流作业时缺少什么?

OK, That is not the perfect solution to it, but you can use 好的,那不是完美的解决方案,但是您可以使用

DataFlowPythonOperator()

which will run the exact same bash command we mentioned before. 这将运行我们之前提到的完全相同的bash命令。 It is a workaround and not equal to the PythonOperator but more like running a BashOperator ... Still can't use the strength of Airflow features in the current case (like xcom)... Docs 这是一种解决方法,而不是等于PythonOperator而更像运行BashOperator ......还是不能用的气流特性的实力在目前的情况下(如XCOM)... 文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM