简体   繁体   English

从 Compute Engine 实例模板创建 Google Batch 作业,未正确导入自定义 Python 库

[英]Create a Google Batch Job from a Compute Engine instance template, not importing custom Python libraries correctly

I am trying to start a Google Batch Job, from an instance template containing some custom Python packages in it, but the Batch Job always fails, telling me the package I imported does not exist.我正在尝试从其中包含一些自定义 Python 包的实例模板启动 Google Batch Job,但 Batch Job 总是失败,告诉我导入的包不存在。 These are the steps I followed ( read all of them carefully before rushing towards an answer ):这些是我遵循的步骤(在急于回答之前仔细阅读所有这些步骤):

  1. Create a VM instance from a public image ; 从公共图像创建 VM 实例 for reference, let's call it source-vm .作为参考,我们称它为source-vm Start and connect to that VM through SSH.启动并通过 SSH 连接到该 VM。
  2. In source-vm , install spaCy , running the following commands in CLI:source-vm中,安装spaCy ,在 CLI 中运行以下命令:
sudo apt update
sudo apt install python3-pip
sudo pip install spacy==3.2.1
sudo python -m spacy download en_core_web_sm
  1. Build /scripts/test.py on source-vm .source-vm上构建/scripts/test.py This can be considered as the main script to be run later on, in the Google Batch Job ( myconfig.json , on Step "7"):这可以被认为是稍后在 Google 批处理作业中运行的主要脚本( myconfig.json ,在步骤“7”):
import spacy
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter

# Read arguments from CLI
parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)
parser.add_argument(
    "-t", "--task_count",
    type=str,
    default="0",
    help="Task count",
    choices={"0", "1", "2", "3", "4", "5"}
)
args = vars(parser.parse_args())
task_count = args["task_count"]

# Read data from Google Cloud Storage mounted data
MOUNTED_GCS_URI="/mnt/disks/test-bucket/input-test/sample-{}.txt".format(task_count.zfill(12))
with open(MOUNTED_GCS_URI,"r") as f:
    mytext = f.read()

# Import a test spaCy model
nlp=spacy.load("en_core_web_sm")

# NLP process: Entity Extraction
doc=nlp(mytext)

# Gather all found entities
found_entities=[]
for ent in doc.ents:
    found_entities.append(
        {"word":doc.text[ent.start_char:ent.end_char], "label":ent.label_}
    )

# Print results
print({"text":doc.text, "entities":found_entities})
  1. Power-off source-vm and with it, create a machine image from a VM ;关闭source-vm并使用它从 VM 创建机器映像 for reference, let's call it base-image .作为参考,我们称它为base-image
  2. Create an instance template based on an existing instance ; 基于现有实例创建实例模板 for this purpose, I used both source-vm and base-image .为此,我同时使用了source-vmbase-image For reference, let's call it my-instance-template .作为参考,我们称它为my-instance-template
  3. ( OPTIONAL ) Create a VM instance from an instance template as a quick test for my-instance-template . 可选从实例模板创建 VM 实例作为my-instance-template的快速测试 This second VM instance was called test-vm .这第二个 VM 实例称为test-vm After creating test-vm , I started and connected to it through SSH, then ran the following commands:创建test-vm后,我启动并通过 SSH 连接到它,然后运行以下命令:
printf '\nINSTALLATION LOCATIONS, FOR PYTHON, PIP & SPACY:\n'
which python3 && which pip && which spacy
printf '\nVERSIONS, FOR PYTHON, PIP & SPACY:\n'
python3 --version && pip --version && python3 -c 'import spacy;print("spaCy version:",spacy.__version__)'
printf '\nWHAT IS IN PATH:\n'
echo $PATH

Which did not trigger any error, and got the following message:没有触发任何错误,并收到以下消息:

INSTALLATION LOCATIONS, FOR PYTHON, PIP & SPACY:
/usr/bin/python3
/usr/bin/pip
/usr/local/bin/spacy

VERSIONS, FOR PYTHON, PIP & SPACY:
Python 3.9.2
pip 20.3.4 from /usr/lib/python3/dist-packages/pip (python 3.9)
spaCy version: 3.2.1

WHAT IS IN PATH:
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

Notice how I did not have to install or do any additional setting on test-vm , this is expected.请注意,我不必在test-vm上安装或进行任何其他设置,这是预期的。

  1. After successfully testing my-instance-template , start a custom Google Batch Job :成功测试my-instance-template后,启动自定义 Google Batch Job
gcloud batch jobs submit batch-job-1 \
  --location us-central1 \
  --config myconfig.json

Where myconfig.json is: myconfig.json在哪里:

{
    "taskGroups": [
        {
            "taskSpec": {
                "runnables": [
                    {
                        "script": {
                            "text": "python3 /scripts/test.py -t ${BATCH_TASK_INDEX} >> /mnt/disks/test-bucket/output-test/output-${BATCH_TASK_INDEX}.txt"
                        }
                    }
                ],

                "volumes": [
                    {
                        "gcs": {
                            "remotePath": "test-bucket"
                        },
                        "mountPath": "/mnt/disks/test-bucket"
                    }
                ],

                "computeResource": {
                    "cpuMilli": 2000,
                    "memoryMib": 2000
                },
                "maxRetryCount": 0,
                "maxRunDuration": "600s"
            },
            "taskCount": 6,
            "parallelism": 2
        }
    ],
    "allocationPolicy": {
        "instances": [
            {
                "installGpuDrivers": false,
                "instanceTemplate": "my-instance-template"
            }
        ]
    },
    "labels": {
        "department": "my-department",
        "env": "testing"
    },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

This Batch Job failed, with the following error obtained from the Cloud Logs:此批处理作业失败,并从云日志中获取以下错误:

Traceback (most recent call last):
File "/scripts/test.py", line 1, in <module>
import spacy
ModuleNotFoundError: No module named 'spacy'

QUESTION:题:

Why my Google Batch Job from a Compute Engine Instance Template fails, telling me "it did not find spaCy" (Step "7"), but when the exact same Compute Engine Instance Template is used to build an isolate VM instance (Step "6"), then everything works OK and spaCy library is imported correctly?为什么我的来自 Compute Engine 实例模板的 Google Batch Job 失败,告诉我“它没有找到 spaCy”(步骤“7”),但是当使用完全相同的 Compute Engine 实例模板构建一个隔离的 VM 实例时(步骤“6” "),那么一切正常,spaCy 库是否正确导入?

I just solved this case, after realizing that all the packages installes in Step "2", were installed using sudo .在意识到步骤“2”中安装的所有软件包都是使用sudo安装后,我刚刚解决了这个问题。 Therefore, myconfig.jsonl runnable (Step "7"), should be slighlty modified as follows:因此, myconfig.jsonl runnable(步骤“7”),应该稍微修改如下:

...
"taskSpec": {
    "runnables": [
                    {
                        "script": {
                            "text": "sudo python3 /scripts/test.py -t ${BATCH_TASK_INDEX} >> /mnt/disks/test-bucket/output-test/output-${BATCH_TASK_INDEX}.txt"
                        }
                    }
                ],
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 重命名 Google Compute Engine VM 实例 - Rename Google Compute Engine VM Instance 无法 SSH 进入我在 Google Cloud 上的 Compute Engine 虚拟机实例 - Unable to SSH into my Compute Engine VM instance on Google Cloud 如何配置 Terraform 以在不销毁和重新创建的情况下更新 GCP 计算引擎实例模板? - How can I configure Terraform to update a GCP compute engine instance template without destroying and re-creating? 验证 Google Compute Engine (GCE) 以从 Google Container Registry (GCR) 中提取图像 - Authenticate Google Compute Engine (GCE) to Pull Image from Google Container Registry (GCR) 暂停 Dataproc 集群 - Google 计算引擎 - Pausing Dataproc cluster - Google Compute engine 在 Google Compute Engine 上删除或释放静态 IP - Removing or releasing a static IP on Google Compute Engine Google Compute Engine 分配静态 IP 定价? - Google Compute Engine Assigning Static IP pricing? 计算引擎VM实例CENTOS7中的Bq命令行错误 - Bq command line error in compute engine VM instance CENTOS7 在 Google Compute Engine 上打开端口 8080 - Open port 8080 on Google Compute Engine 谷歌计算引擎无法访问虚拟机 - Google Compute Engine Unable to Access VM
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM