[英]Passing requirements.txt to Google Cloud Pyspark Batch Job
I am trying to run a pyspark script as through a Google Dataproc Batch Job.我正在尝试通过 Google Dataproc 批处理作业运行 pyspark 脚本。
My script should connect to firestore to collect some data from there, so I need to access the library firebase-admin
.我的脚本应该连接到 firestore 以从那里收集一些数据,所以我需要访问库
firebase-admin
。 When I run the script on Google Cloud through the following command:当我通过以下命令在 Google Cloud 上运行脚本时:
gcloud dataproc batches submit \
--project {PROJECT} \
--region europe-west1 \
--subnet {SUBNET} \
pyspark spark_image_matching/main.py \
--jars=gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar \
--deps-bucket={DEPS_BUCKET}
I receive the following error:我收到以下错误:
Traceback (most recent call last):
File "/tmp/srvls-batch-0127aaf6-a438-4439-af56-beb1a66f45ed/main.py", line 4, in <module>
import firebase_admin
ModuleNotFoundError: No module named 'firebase_admin'
I already tried creating a setup.py
file to generate an.egg file that specifies the dependency along with the --py-files
flag.我已经尝试创建一个
setup.py
文件来生成一个 .egg 文件,该文件指定依赖项以及--py-files
标志。 This idea was highly inspired by this repo:这个想法受到这个 repo 的高度启发:
http://www.restez-en-bonne-sante-leh.com/?_=%2FGoogleCloudPlatform%2Fdataproc-templates%2Fblob%2Fmain%2Fpython%2Fsetup.py%23BQyskaWdLgo6VQOkV2YyLaeS http://www.restez-en-bonne-sante-leh.com/?_=%2FGoogleCloudPlatform%2Fdataproc-templates%2Fblob%2Fmain%2Fpython%2Fsetup.py%23BQyskaWdLgo6VQOkV2YyLaeS
To customize Dataproc Serverless for Spark execution environment it is recommended to use custom container images: https://cloud.google.com/dataproc-serverless/docs/guides/custom-containers要为 Spark 执行环境自定义 Dataproc Serverless,建议使用自定义容器映像: https://cloud.google.com/dataproc-serverless/docs/guides/custom-containers
As an alternative you can take a look at Spark-supported ways of managing Python dependencies: https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html作为替代方案,您可以查看 Spark 支持的管理 Python 依赖项的方法: https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.