[英]Create a Google Batch Job from a Compute Engine instance template, not importing custom Python libraries correctly
I am trying to start a Google Batch Job, from an instance template containing some custom Python packages in it, but the Batch Job always fails, telling me the package I imported does not exist.我正在尝试从其中包含一些自定义 Python 包的实例模板启动 Google Batch Job,但 Batch Job 总是失败,告诉我导入的包不存在。 These are the steps I followed ( read all of them carefully before rushing towards an answer ):这些是我遵循的步骤(在急于回答之前仔细阅读所有这些步骤):
source-vm
.作为参考,我们称它为source-vm
。 Start and connect to that VM through SSH.启动并通过 SSH 连接到该 VM。source-vm
, install spaCy , running the following commands in CLI:在source-vm
中,安装spaCy ,在 CLI 中运行以下命令:sudo apt update
sudo apt install python3-pip
sudo pip install spacy==3.2.1
sudo python -m spacy download en_core_web_sm
/scripts/test.py
on source-vm
.在source-vm
上构建/scripts/test.py
。 This can be considered as the main script to be run later on, in the Google Batch Job ( myconfig.json
, on Step "7"):这可以被认为是稍后在 Google 批处理作业中运行的主要脚本( myconfig.json
,在步骤“7”):import spacy
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
# Read arguments from CLI
parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)
parser.add_argument(
"-t", "--task_count",
type=str,
default="0",
help="Task count",
choices={"0", "1", "2", "3", "4", "5"}
)
args = vars(parser.parse_args())
task_count = args["task_count"]
# Read data from Google Cloud Storage mounted data
MOUNTED_GCS_URI="/mnt/disks/test-bucket/input-test/sample-{}.txt".format(task_count.zfill(12))
with open(MOUNTED_GCS_URI,"r") as f:
mytext = f.read()
# Import a test spaCy model
nlp=spacy.load("en_core_web_sm")
# NLP process: Entity Extraction
doc=nlp(mytext)
# Gather all found entities
found_entities=[]
for ent in doc.ents:
found_entities.append(
{"word":doc.text[ent.start_char:ent.end_char], "label":ent.label_}
)
# Print results
print({"text":doc.text, "entities":found_entities})
source-vm
and with it, create a machine image from a VM ;关闭source-vm
并使用它从 VM 创建机器映像; for reference, let's call it base-image
.作为参考,我们称它为base-image
。source-vm
and base-image
.为此,我同时使用了source-vm
和base-image
。 For reference, let's call it my-instance-template
.作为参考,我们称它为my-instance-template
。my-instance-template
. (可选) 从实例模板创建 VM 实例作为my-instance-template
的快速测试。 This second VM instance was called test-vm
.这第二个 VM 实例称为test-vm
。 After creating test-vm
, I started and connected to it through SSH, then ran the following commands:创建test-vm
后,我启动并通过 SSH 连接到它,然后运行以下命令:printf '\nINSTALLATION LOCATIONS, FOR PYTHON, PIP & SPACY:\n'
which python3 && which pip && which spacy
printf '\nVERSIONS, FOR PYTHON, PIP & SPACY:\n'
python3 --version && pip --version && python3 -c 'import spacy;print("spaCy version:",spacy.__version__)'
printf '\nWHAT IS IN PATH:\n'
echo $PATH
Which did not trigger any error, and got the following message:没有触发任何错误,并收到以下消息:
INSTALLATION LOCATIONS, FOR PYTHON, PIP & SPACY:
/usr/bin/python3
/usr/bin/pip
/usr/local/bin/spacy
VERSIONS, FOR PYTHON, PIP & SPACY:
Python 3.9.2
pip 20.3.4 from /usr/lib/python3/dist-packages/pip (python 3.9)
spaCy version: 3.2.1
WHAT IS IN PATH:
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
Notice how I did not have to install or do any additional setting on test-vm
, this is expected.请注意,我不必在test-vm
上安装或进行任何其他设置,这是预期的。
my-instance-template
, start a custom Google Batch Job :成功测试my-instance-template
后,启动自定义 Google Batch Job :gcloud batch jobs submit batch-job-1 \
--location us-central1 \
--config myconfig.json
Where myconfig.json
is: myconfig.json
在哪里:
{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"script": {
"text": "python3 /scripts/test.py -t ${BATCH_TASK_INDEX} >> /mnt/disks/test-bucket/output-test/output-${BATCH_TASK_INDEX}.txt"
}
}
],
"volumes": [
{
"gcs": {
"remotePath": "test-bucket"
},
"mountPath": "/mnt/disks/test-bucket"
}
],
"computeResource": {
"cpuMilli": 2000,
"memoryMib": 2000
},
"maxRetryCount": 0,
"maxRunDuration": "600s"
},
"taskCount": 6,
"parallelism": 2
}
],
"allocationPolicy": {
"instances": [
{
"installGpuDrivers": false,
"instanceTemplate": "my-instance-template"
}
]
},
"labels": {
"department": "my-department",
"env": "testing"
},
"logsPolicy": {
"destination": "CLOUD_LOGGING"
}
}
This Batch Job failed, with the following error obtained from the Cloud Logs:此批处理作业失败,并从云日志中获取以下错误:
Traceback (most recent call last):
File "/scripts/test.py", line 1, in <module>
import spacy
ModuleNotFoundError: No module named 'spacy'
QUESTION:题:
Why my Google Batch Job from a Compute Engine Instance Template fails, telling me "it did not find spaCy" (Step "7"), but when the exact same Compute Engine Instance Template is used to build an isolate VM instance (Step "6"), then everything works OK and spaCy library is imported correctly?为什么我的来自 Compute Engine 实例模板的 Google Batch Job 失败,告诉我“它没有找到 spaCy”(步骤“7”),但是当使用完全相同的 Compute Engine 实例模板构建一个隔离的 VM 实例时(步骤“6” "),那么一切正常,spaCy 库是否正确导入?
I just solved this case, after realizing that all the packages installes in Step "2", were installed using sudo
.在意识到步骤“2”中安装的所有软件包都是使用sudo
安装后,我刚刚解决了这个问题。 Therefore, myconfig.jsonl
runnable (Step "7"), should be slighlty modified as follows:因此, myconfig.jsonl
runnable(步骤“7”),应该稍微修改如下:
...
"taskSpec": {
"runnables": [
{
"script": {
"text": "sudo python3 /scripts/test.py -t ${BATCH_TASK_INDEX} >> /mnt/disks/test-bucket/output-test/output-${BATCH_TASK_INDEX}.txt"
}
}
],
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.