简体   繁体   English

airflow 操作员使用 gcloud beta dataproc 命令

[英]airflow operator to use gcloud beta dataproc commands

Does anyone know if it exists an Airflow operator that could do what do the gcloud beta commands?有谁知道它是否存在可以执行 gcloud beta 命令的 Airflow 操作员? I'm trying to launch a Spark job on a GKE cluster.我正在尝试在 GKE 集群上启动 Spark 作业。 The gcloud beta commands works, but it is not the case using DataprocSparkOperator . gcloud beta 命令有效,但使用DataprocSparkOperator时并非如此。

With this operator, the job keeps turning but the driver pod is not instantiated, but it works doing the gcloud command referenced here: https://cloud.google.com/dataproc/docs/concepts/jobs/dataproc-gke有了这个操作员,工作一直在进行,但驱动程序 pod 没有被实例化,但是它可以执行这里引用的 gcloud 命令: https://cloud.google.com/dataproc/docs/concepts/jobs/dataproc-gke

To be completely honest, I believe that Airflow is not intended to run gcloud commands.老实说,我认为 Airflow 不是用来运行gcloud命令的。 If there is no operator, that you can use it's better to use Google API in conduction with PythonOperator .如果没有运营商,你可以使用它更好地使用谷歌 API 与PythonOperator一起进行。

If you really want to use gcloud commands, you'll need to install gcloud SDK in your Airflow instance: https://cloud.google.com/sdk/docs/downloads-interactive#silent .如果你真的想使用gcloud命令,你需要在你的 Airflow 实例中安装gcloud SDK: https://cloud.google.com/sdk/docs/downloads-interactive#silent It's quite heavy, so if you have Airflow as a Service it will take longer to deploy it.它非常重,所以如果你有 Airflow 作为服务,部署它需要更长的时间。

After all you'll need to authorize - there is service-account way which might be optimal for you: https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account .毕竟您需要授权 - service-account方式可能最适合您: https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account
You'll have to put service-account in some safe place, eg HDFS (if you have a cluster).您必须将service-account地方,例如HDFS (如果您有集群)。 For local purposes it can be stored locally.出于本地目的,它可以存储在本地。

If you're done with authorization just use BashOperator to do what you want - you have gcloud in your Airflow installed.如果你完成了授权,只需使用BashOperator来做你想做的事——你的gcloud中安装了 gcloud。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 dataproc 上的组件网关激活不适用于 composer(airflow) 运算符 airflow.providers.google.cloud.operators.dataproc - Component Gateway activation on dataproc does not work with composer(airflow) operator airflow.providers.google.cloud.operators.dataproc 为什么从我的主机而不是 gcloud 交互式 shell 使用 gcloud 命令如此困难? - Why is it so hard to use gcloud commands from my host computer instead of the gcloud interactive shell? 如何在另一个任务气流中使用查询(bigquery operator)的结果 - How to use the result of a query (bigquery operator) in another task-airflow 在 Airflow 1.10 重试创建 dataproc 集群 - Retry of dataproc cluster creation in Airflow 1.10 Airflow 运算符 BigQueryTablePartitionExistenceSensor 问题 - Airflow Operator BigQueryTablePartitionExistenceSensor Question gcloud beta 运行部署 --source。 抛出 412 - gcloud beta run deploy --source . throws 412 在没有 SSH 的 Dataproc 集群上从 airflow 触发 spark 提交作业 - Trigger spark submit jobs from airflow on Dataproc Cluster without SSH 如何将 gcloud 与 Babashka 一起使用 - How to use gcloud with Babashka Dataproc Spark 运算符如何返回值以及如何捕获和返回值 - How does a Dataproc Spark operator return a value and how to capture and return it 为 Azure airflow 运算符获取等效的 DataprocCreateBatchOperator 运算符 - Get the equivalent DataprocCreateBatchOperator operator for Azure airflow operator
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM