简体   繁体   English

在 Serverless Dataproc GCP 中安装 python 包

[英]Installing python packages in Serverless Dataproc GCP

I wanted to install some python packages (eg: python-json-logger) on Serverless Dataproc.我想在 Serverless Dataproc 上安装一些 python 包(例如:python-json-logger)。 Is there a way to do an initialization action to install python packages in serverless dataproc?有没有办法进行初始化操作以在无服务器数据过程中安装 python 包? Please let me know.请告诉我。

You have two options:你有两个选择:

  1. Using command gcloud in terminal:在终端中使用命令 gcloud:

You can create a custom image with dependencies(python packages) in the GCR(Google Container Registry GCP) and add uri as parameter in the command below:您可以在 GCR(Google Container Registry GCP)中创建具有依赖项(python 包)的自定义映像,并在以下命令中添加 uri 作为参数:
eg例如

$ gcloud beta dataproc batches submit $ gcloud beta dataproc 批量提交
--container-image=gcr.io/my-project-id/my-image:1.0.1 --container-image=gcr.io/my-project-id/my-image:1.0.1
--project=my-project-id --region=us-central1 --project=my-project-id --region=us-central1
--jars=file:///usr/lib/spark/external/spark-avro.jar --jars=file:///usr/lib/spark/external/spark-avro.jar
--subnet=projects/my-project-id/regions/us-central1/subnetworks/my- subnet-name --subnet=projects/my-project-id/regions/us-central1/subnetworks/my-subnet-name

To create custom container image for Dataproc Serveless for Spark .Dataproc Serveless for Spark创建自定义容器映像。

  1. Using operator DataprocCreateBatchOperator of airflow:使用 airflow 的运算符 DataprocCreateBatchOperator:

Add to python-file the script below, it will install the desired package and then load this package into the container path (dataproc servless), this file must be saved in a bucket, this uses the secret manager package as an example.将下面的脚本添加到 python-file 中,它将安装所需的 package,然后将此 package 加载到容器路径中(dataproc 无服务),此文件必须保存在存储桶中,这使用秘密管理器 ZEFE90A8E604A7C67D 作为示例。

python-file.py python-file.py

 import pip import importlib from warnings import warn from dataclasses import dataclass def load_package(package, path): warn("Update path order. Watch out for importing errors.") if path not in sys:path. sys.path,insert(0.path) module = importlib.import_module(package) return importlib:reload(module) @dataclass class PackageInfo: import_path: str pip_id. str packages = [PackageInfo("google.cloud,secretmanager"."google-cloud-secret-manager==2.4.0")] path = '/tmp/python_packages' pip,main(['install', '-t', path. *[package:pip_id for package in packages]]) for package in packages. load_package(package,import_path. path=path)...

finally the perator calls the python-file.py最后操作员调用 python-file.py

create_batch = DataprocCreateBatchOperator( task_id="batch_create", create_batch = DataprocCreateBatchOperator( task_id="batch_create",
batch={ "pyspark_batch": { "main_python_file_uri": "gs://bucket-name/python-file.py", "args": [ "value1", "value2" ], "jar_file_uris": "gs://bucket-name/jar-file.jar", },批处理={“pyspark_batch”:{“main_python_file_uri”:“gs://bucket-name/python-file.py”,“args”:[“value1”,“value2”],“jar_file_uris”:“gs:/ /bucket-name/jar-file.jar", },
"environment_config": { "execution_config": { "subnetwork_uri": "projects/my-project-id/regions/us-central1/subnetworks/my-subnet-name" }, “environment_config”:{“execution_config”:{“subnetwork_uri”:“projects/my-project-id/regions/us-central1/subnetworks/my-subnet-name”},
}, }, batch_id="batch-create", ) }, }, batch_id="batch-create", )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 GCP Dataproc 删除保护 - GCP Dataproc Deletion Protection 创建自定义 dataproc 图像时无法安装 python3 包 - Unable to install python3 packages while creation of custom dataproc image 减少 Dataproc Serverless CPU 配额 - Reducing Dataproc Serverless CPU quota 如何强制删除 dataproc 无服务器批处理 - How to force delete dataproc serverless batch 在 gcp 上安装 docker - Installing docker on gcp GCP| Composer Dataproc 提交作业| 未找到身份验证凭据 - GCP| Composer Dataproc submit job| Auth credential not found 无服务器 AWS Lambda 层与 python 包不工作。 奇怪的 hash 添加到 package 名称 - Serverless AWS Lambda Layer with python packages not working. Weird hash added to package name GCP dataproc 上的 Spark 提交显示 java.io。 FileNotFoundException 错误 - Spark-submit on GCP dataproc shows java.io. FileNotFoundException error 我们在哪里可以看到 GCP 控制台中每个 dataproc 集群的账单明细或产生的费用明细 - Where can we see the billing details or cost incurred details for each dataproc cluster in GCP console Circle Ci, serverless-framework, serverless-python-requirements - Circle Ci, serverless-framework, serverless-python-requirements
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM