简体   繁体   中英

airflow operator to use gcloud beta dataproc commands

Does anyone know if it exists an Airflow operator that could do what do the gcloud beta commands? I'm trying to launch a Spark job on a GKE cluster. The gcloud beta commands works, but it is not the case using DataprocSparkOperator .

With this operator, the job keeps turning but the driver pod is not instantiated, but it works doing the gcloud command referenced here: https://cloud.google.com/dataproc/docs/concepts/jobs/dataproc-gke

To be completely honest, I believe that Airflow is not intended to run gcloud commands. If there is no operator, that you can use it's better to use Google API in conduction with PythonOperator .

If you really want to use gcloud commands, you'll need to install gcloud SDK in your Airflow instance: https://cloud.google.com/sdk/docs/downloads-interactive#silent . It's quite heavy, so if you have Airflow as a Service it will take longer to deploy it.

After all you'll need to authorize - there is service-account way which might be optimal for you: https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account .
You'll have to put service-account in some safe place, eg HDFS (if you have a cluster). For local purposes it can be stored locally.

If you're done with authorization just use BashOperator to do what you want - you have gcloud in your Airflow installed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM