简体   繁体   English

GKE 上的 Dataproc:python 属性中列出的软件包未安装

[英]Dataproc on GKE: python packages listed in properties not installed

I created a dataproc cluster on a GKE cluster.我在 GKE 集群上创建了一个 dataproc 集群。 The required packages already included inside the properties like examples in here .所需的包已包含在属性中,例如此处的示例。 But when I submitted a job, it failed with an error: ModuleNotFoundError.但是当我提交作业时,它失败并出现错误:ModuleNotFoundError。

...
Waiting for job output...
 PYSPARK_PYTHON=/opt/conda/bin/python
JAVA_HOME=/usr/lib/jvm/temurin-8-jdk-amd64
SPARK_EXTRA_CLASSPATH=
Merging Spark configs
Skipping merging /opt/spark/conf/spark-defaults.conf, file does not exist.
Skipping merging /opt/spark/conf/log4j.properties, file does not exist.
Skipping merging /opt/spark/conf/spark-env.sh, file does not exist.
Skipping custom init script, file does not exist.
Running heartbeat loop
Traceback (most recent call last):
  File "/tmp/spark-d6516b57-0924-4ce2-9de8-a5c1116667b4/pkg.py", line 1, in <module>
    from google.cloud import secretmanager
ModuleNotFoundError: No module named 'google'

This is the gcloud command I used:这是我使用的 gcloud 命令:

gcloud dataproc clusters gke create gke-dp --region=asia-southeast1 --spark-engine-version=3.1 \
--gke-cluster=gke-spark --gke-cluster-location=asia-southeast1-b --namespace=dataproc \
--pools='name=dp-default,roles=default,machineType=n2-standard-2,min=1,max=1' \
--pools='name=dp-workers,roles=spark-driver;spark-executor,machineType=n2-standard-4,min=1,max=4' \
--properties='^#^dataproc:pip.packages=google-cloud-secret-manager==2.15.0,numpy==1.24.1#spark:spark.jars=https://jdbc.postgresql.org/download/postgresql-42.5.1.jar' \
--properties="dataproc:dataproc.gke.agent.google-service-account=dataproc@de-project.iam.gserviceaccount.com" \
--properties="dataproc:dataproc.gke.spark.driver.google-service-account=dataproc@de-project.iam.gserviceaccount.com" \
--properties="dataproc:dataproc.gke.spark.executor.google-service-account=dataproc@de-project.iam.gserviceaccount.com"

This functionality is not supported by Dataproc on GKE. GKE 上的 Dataproc 不支持此功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Serverless Dataproc GCP 中安装 python 包 - Installing python packages in Serverless Dataproc GCP 创建自定义 dataproc 图像时无法安装 python3 包 - Unable to install python3 packages while creation of custom dataproc image 将 Python 项目提交到 Dataproc 作业 - Submit a Python project to Dataproc job Dataproc - SparkSession.builder.appName 未反映在 dataproc 属性上 - Dataproc - SparkSession.builder.appName not reflected on dataproc properties 如何在 dataproc 中使用 --properties-file 标志? - How to use --properties-file flag in dataproc? 您可以从 Dataproc 触发 Python 脚本吗? - Can you trigger Python Scripts from Dataproc? 使用属性文件向 Google Dataproc 提交 Pig 作业时出错 - Error in submitting a pig job to Google Dataproc with properties file 错误:pip 的依赖解析器当前不考虑所有已安装的包。 数据流 python 自定义模板 - ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. Dataflow python custom template 如何使用 Python 创建具有生存时间的 Dataproc 集群 SDK - How to create a Dataproc cluster with time to live using Python SDK 来自 google cloud dataproc python notebook 的 Run.py 文件 - Run .py file from google cloud dataproc python notebook
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM