将 spaCy model 训练为 Vertex AI 管道“组件”

Question

我正在尝试训练 spaCy model ，但将代码转换为 Vertex AI Pipeline Component 。 我当前的代码是：

@component(
    packages_to_install=[
        "setuptools",
        "wheel", 
        "spacy[cuda113,transformers,lookups]",
    ],
    base_image="gcr.io/deeplearning-platform-release/base-cu113",
    output_component_file="train.yaml"
)
def train(train_name: str, dev_name: str) -> NamedTuple("output", [("model_path", str)]):
    """
    Trains a spacy model
    
    Parameters:
    ----------
    train_name : Name of the spaCy "train" set, used for model training.
    dev_name: Name of the spaCy "dev" set, , used for model training.
    
    Returns:
    -------
    output : Destination path of the saved model.
    """
    import spacy
    import subprocess
    
    spacy.require_gpu()  # <=== IMAGE FAILS TO BE COMPILED HERE
    
    # NOTE: The remaining code has already been tested and proven to be functional.
    #       It has been edited since the project is private.
    
    # Presets for training
    subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])

    # Training model
    location = "gcs/secret_model_destination_path/TestModel"
    subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
                    "--output", location,
                    "--paths.train", "gcs/secret_bucket/secret_path/{}.spacy".format(train_name),
                    "--paths.dev", "gcs/secret_bucket/secret_path/{}.spacy".format(dev_name),
                    "--gpu-id", "0"])
    
    return (location,)

Vertex AI 日志显示以下是失败的主要原因：

库已成功安装，但我觉得缺少一些库/设置（根据我的经验）； 但是我不知道如何使它“兼容基于 Python 的 Vertex AI 组件”。 顺便说一句，在我的代码中必须使用 GPU。

有任何想法吗？

Answer 1

经过一些排练，我想我已经弄清楚我的代码遗漏了什么。 实际上， train组件定义是正确的（相对于最初发布的内容进行了一些小调整）； 但是管道缺少 GPU 定义。 我将首先包括一个虚拟示例代码，它使用 spaCy 训练 NER model，并通过 Vertex AI 管道编排一切：

from kfp.v2 import compiler
from kfp.v2.dsl import pipeline, component, Dataset, Input, Output, OutputPath, InputPath
from datetime import datetime
from google.cloud import aiplatform
from typing import NamedTuple


# Component definition

@component(
    packages_to_install=[
        "setuptools",
        "wheel", 
        "spacy[cuda113,transformers,lookups]",
    ],
    base_image="gcr.io/deeplearning-platform-release/base-cu113",
    output_component_file="generate.yaml"
)
def generate_spacy_file(train_path: OutputPath(), dev_path: OutputPath()):
    """
    Generates a small, dummy 'train.spacy' & 'dev.spacy' file
    
    Returns:
    -------
    train_path : Relative location in GCS, for the "train.spacy" file.
    dev_path: Relative location in GCS, for the "dev.spacy" file.
    """
    import spacy
    from spacy.training import Example
    from spacy.tokens import DocBin

    td = [    # Train (dummy) dataset, in 'spacy V2 presentation'
              ("Walmart is a leading e-commerce company", {"entities": [(0, 7, "ORG")]}),
              ("I reached Chennai yesterday.", {"entities": [(19, 28, "GPE")]}),
              ("I recently ordered a book from Amazon", {"entities": [(24,32, "ORG")]}),
              ("I was driving a BMW", {"entities": [(16,19, "PRODUCT")]}),
              ("I ordered this from ShopClues", {"entities": [(20,29, "ORG")]}),
              ("Fridge can be ordered in Amazon ", {"entities": [(0,6, "PRODUCT")]}),
              ("I bought a new Washer", {"entities": [(16,22, "PRODUCT")]}),
              ("I bought a old table", {"entities": [(16,21, "PRODUCT")]}),
              ("I bought a fancy dress", {"entities": [(18,23, "PRODUCT")]}),
              ("I rented a camera", {"entities": [(12,18, "PRODUCT")]}),
              ("I rented a tent for our trip", {"entities": [(12,16, "PRODUCT")]}),
              ("I rented a screwdriver from our neighbour", {"entities": [(12,22, "PRODUCT")]}),
              ("I repaired my computer", {"entities": [(15,23, "PRODUCT")]}),
              ("I got my clock fixed", {"entities": [(16,21, "PRODUCT")]}),
              ("I got my truck fixed", {"entities": [(16,21, "PRODUCT")]}),
    ]
    
    dd = [    # Development (dummy) dataset (CV), in 'spacy V2 presentation'
              ("Flipkart started it's journey from zero", {"entities": [(0,8, "ORG")]}),
              ("I recently ordered from Max", {"entities": [(24,27, "ORG")]}),
              ("Flipkart is recognized as leader in market",{"entities": [(0,8, "ORG")]}),
              ("I recently ordered from Swiggy", {"entities": [(24,29, "ORG")]})
    ]

    
    # Converting Train & Development datasets, from 'spaCy V2' to 'spaCy V3'
    nlp = spacy.blank("en")
    db_train = DocBin()
    db_dev = DocBin()

    for text, annotations in td:
        example = Example.from_dict(nlp.make_doc(text), annotations)
        db_train.add(example.reference)
        
    for text, annotations in dd:
        example = Example.from_dict(nlp.make_doc(text), annotations)
        db_dev.add(example.reference)
    
    db_train.to_disk(train_path + ".spacy")  # <== Obtaining and storing "train.spacy"
    db_dev.to_disk(dev_path + ".spacy")      # <== Obtaining and storing "dev.spacy"
    

# ----------------------- ORIGINALLY POSTED CODE -----------------------

@component(
    packages_to_install=[
        "setuptools",
        "wheel", 
        "spacy[cuda113,transformers,lookups]",
    ],
    base_image="gcr.io/deeplearning-platform-release/base-cu113",
    output_component_file="train.yaml"
)
def train(train_path: InputPath(), dev_path: InputPath(), output_path: OutputPath()):
    """
    Trains a spacy model
    
    Parameters:
    ----------
    train_path : Relative location in GCS, for the "train.spacy" file.
    dev_path: Relative location in GCS, for the "dev.spacy" file.
    
    Returns:
    -------
    output : Destination path of the saved model.
    """
    import spacy
    import subprocess
    
    spacy.require_gpu()  # <=== IMAGE NOW MANAGES TO GET BUILT!

    # Presets for training
    subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])

    # Training model
    subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
                    "--output", output_path,
                    "--paths.train", "{}.spacy".format(train_path),
                    "--paths.dev", "{}.spacy".format(dev_path),
                    "--gpu-id", "0"])

# ----------------------------------------------------------------------
    

# Pipeline definition

@pipeline(
    pipeline_root=PIPELINE_ROOT,
    name="spacy-dummy-pipeline",
)
def spacy_pipeline():
    """
    Builds a custom pipeline
    """
    # Generating dummy "train.spacy" + "dev.spacy"
    train_dev_sets = generate_spacy_file()
    # With the output of the previous component, train a spaCy modeL    
    model = train(
        train_dev_sets.outputs["train_path"],
        train_dev_sets.outputs["dev_path"]
    
    # ------ !!! THIS SECTION DOES THE TRICK !!! ------
    ).add_node_selector_constraint(
        label_name="cloud.google.com/gke-accelerator",
        value="NVIDIA_TESLA_T4"
    ).set_gpu_limit(1).set_memory_limit('32G')
    # -------------------------------------------------

# Pipeline compilation   

compiler.Compiler().compile(
    pipeline_func=spacy_pipeline, package_path="pipeline_spacy_job.json"
)


# Pipeline run

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

run = aiplatform.PipelineJob(  # Include your own naming here
    display_name="spacy-dummy-pipeline",
    template_path="pipeline_spacy_job.json",
    job_id="ml-pipeline-spacydummy-small-{0}".format(TIMESTAMP),
    parameter_values={},
    enable_caching=True,
)


# Pipeline gets submitted

run.submit()

现在，解释； 根据谷歌：

默认情况下，该组件将作为 Vertex AI CustomJob 使用 e2-standard-4 机器运行，该机器具有 4 核 CPU 和 16GB memory。

因此，当train组件被编译时，它会失败，因为“它没有看到任何 GPU 可用作资源”； 然而，在同一个链接中，提到了 CPU 和 GPU 的所有可用设置。 如您所见，在我的例子中，我将train组件设置为在一 (1) NVIDIA_TESLA_T4 GPU 卡下运行，并且我还将 CPU memory 增加到 32GB。 通过这些修改，生成的管道如下所示：

如您所见，它已成功编译，并训练（并最终获得）一个功能性的 spaCy model。从这里，您可以调整此代码以满足您自己的需要。

我希望这对任何可能感兴趣的人有所帮助。

谢谢你。

Answer 2

删除失败的行。 IE。 spacy.require_gpu() # <=== IMAGE FAILS TO BE COMPILED HERE

还要调整以删除 cuda 安装行cuda113,

您的代码设置为使用 GPU，但对于学习练习，您不需要 GPU。我不知道，您也不知道如何指定启用 GPU 的 python 顶点 AI gcp 实例。 因此删除了对 GPU 的要求。一旦代码运行，您可以返回 go 并调整以添加 GPU。

Answer 3

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/ /"
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-11-2
python -m spacy download en_core_web_trf # optional

在另一个单元格上安装其他 pip 包和依赖pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

指向正确的 cuda 文件夹export CUDA_PATH="/usr/local/cuda-11"

安装 spacy 变形金刚信息pip install -U spacy[cuda113,transformers]更多信息： pip install cupy-cuda113

现在库、数据包和单元已正确定位并安装，这应该可以工作

>>> import spacy
>>> spacy.require_gpu()

将 spaCy model 训练为 Vertex AI 管道“组件”

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-04-27 21:12:04

解决方案2
0 2022-04-26 13:19:33

解决方案3
0 2022-04-26 18:11:17

将 spaCy model 训练为 Vertex AI 管道“组件”

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-04-27 21:12:04

解决方案2 0 2022-04-26 13:19:33

解决方案3 0 2022-04-26 18:11:17

解决方案1
1 已采纳 2022-04-27 21:12:04

解决方案2
0 2022-04-26 13:19:33

解决方案3
0 2022-04-26 18:11:17