簡體   English   中英

GKE 上的 Dataproc:python 屬性中列出的軟件包未安裝

[英]Dataproc on GKE: python packages listed in properties not installed

我在 GKE 集群上創建了一個 dataproc 集群。 所需的包已包含在屬性中,例如此處的示例。 但是當我提交作業時,它失敗並出現錯誤:ModuleNotFoundError。

...
Waiting for job output...
 PYSPARK_PYTHON=/opt/conda/bin/python
JAVA_HOME=/usr/lib/jvm/temurin-8-jdk-amd64
SPARK_EXTRA_CLASSPATH=
Merging Spark configs
Skipping merging /opt/spark/conf/spark-defaults.conf, file does not exist.
Skipping merging /opt/spark/conf/log4j.properties, file does not exist.
Skipping merging /opt/spark/conf/spark-env.sh, file does not exist.
Skipping custom init script, file does not exist.
Running heartbeat loop
Traceback (most recent call last):
  File "/tmp/spark-d6516b57-0924-4ce2-9de8-a5c1116667b4/pkg.py", line 1, in <module>
    from google.cloud import secretmanager
ModuleNotFoundError: No module named 'google'

這是我使用的 gcloud 命令:

gcloud dataproc clusters gke create gke-dp --region=asia-southeast1 --spark-engine-version=3.1 \
--gke-cluster=gke-spark --gke-cluster-location=asia-southeast1-b --namespace=dataproc \
--pools='name=dp-default,roles=default,machineType=n2-standard-2,min=1,max=1' \
--pools='name=dp-workers,roles=spark-driver;spark-executor,machineType=n2-standard-4,min=1,max=4' \
--properties='^#^dataproc:pip.packages=google-cloud-secret-manager==2.15.0,numpy==1.24.1#spark:spark.jars=https://jdbc.postgresql.org/download/postgresql-42.5.1.jar' \
--properties="dataproc:dataproc.gke.agent.google-service-account=dataproc@de-project.iam.gserviceaccount.com" \
--properties="dataproc:dataproc.gke.spark.driver.google-service-account=dataproc@de-project.iam.gserviceaccount.com" \
--properties="dataproc:dataproc.gke.spark.executor.google-service-account=dataproc@de-project.iam.gserviceaccount.com"

GKE 上的 Dataproc 不支持此功能。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM