GKE 上的 Dataproc:python 属性中列出的软件包未安装

[英]Dataproc on GKE: python packages listed in properties not installed

I created a dataproc cluster on a GKE cluster.我在 GKE 集群上创建了一个 dataproc 集群。 The required packages already included inside the properties like examples in here .所需的包已包含在属性中,例如此处的示例。 But when I submitted a job, it failed with an error: ModuleNotFoundError.但是当我提交作业时,它失败并出现错误:ModuleNotFoundError。

Waiting for job output...
Merging Spark configs
Skipping merging /opt/spark/conf/spark-defaults.conf, file does not exist.
Skipping merging /opt/spark/conf/log4j.properties, file does not exist.
Skipping merging /opt/spark/conf/spark-env.sh, file does not exist.
Skipping custom init script, file does not exist.
Running heartbeat loop
Traceback (most recent call last):
  File "/tmp/spark-d6516b57-0924-4ce2-9de8-a5c1116667b4/pkg.py", line 1, in <module>
    from google.cloud import secretmanager
ModuleNotFoundError: No module named 'google'

This is the gcloud command I used:这是我使用的 gcloud 命令:

gcloud dataproc clusters gke create gke-dp --region=asia-southeast1 --spark-engine-version=3.1 \
--gke-cluster=gke-spark --gke-cluster-location=asia-southeast1-b --namespace=dataproc \
--pools='name=dp-default,roles=default,machineType=n2-standard-2,min=1,max=1' \
--pools='name=dp-workers,roles=spark-driver;spark-executor,machineType=n2-standard-4,min=1,max=4' \
--properties='^#^dataproc:pip.packages=google-cloud-secret-manager==2.15.0,numpy==1.24.1#spark:spark.jars=https://jdbc.postgresql.org/download/postgresql-42.5.1.jar' \
--properties="dataproc:dataproc.gke.agent.google-service-account=dataproc@de-project.iam.gserviceaccount.com" \
--properties="dataproc:dataproc.gke.spark.driver.google-service-account=dataproc@de-project.iam.gserviceaccount.com" \

This functionality is not supported by Dataproc on GKE. GKE 上的 Dataproc 不支持此功能。

