繁体   English   中英

将 spaCy model 训练为 Vertex AI 管道“组件”

[英]Training spaCy model as a Vertex AI Pipeline "Component"

我正在尝试训练 spaCy model ,但将代码转换为 Vertex AI Pipeline Component 我当前的代码是:

@component(
    packages_to_install=[
        "setuptools",
        "wheel", 
        "spacy[cuda113,transformers,lookups]",
    ],
    base_image="gcr.io/deeplearning-platform-release/base-cu113",
    output_component_file="train.yaml"
)
def train(train_name: str, dev_name: str) -> NamedTuple("output", [("model_path", str)]):
    """
    Trains a spacy model
    
    Parameters:
    ----------
    train_name : Name of the spaCy "train" set, used for model training.
    dev_name: Name of the spaCy "dev" set, , used for model training.
    
    Returns:
    -------
    output : Destination path of the saved model.
    """
    import spacy
    import subprocess
    
    spacy.require_gpu()  # <=== IMAGE FAILS TO BE COMPILED HERE
    
    # NOTE: The remaining code has already been tested and proven to be functional.
    #       It has been edited since the project is private.
    
    # Presets for training
    subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])

    # Training model
    location = "gcs/secret_model_destination_path/TestModel"
    subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
                    "--output", location,
                    "--paths.train", "gcs/secret_bucket/secret_path/{}.spacy".format(train_name),
                    "--paths.dev", "gcs/secret_bucket/secret_path/{}.spacy".format(dev_name),
                    "--gpu-id", "0"])
    
    return (location,)

Vertex AI 日志显示以下是失败的主要原因:

在此处输入图像描述

库已成功安装,但我觉得缺少一些库/设置(根据我的经验); 但是我不知道如何使它“兼容基于 Python 的 Vertex AI 组件”。 顺便说一句,在我的代码中必须使用 GPU。

有任何想法吗?

经过一些排练,我想我已经弄清楚我的代码遗漏了什么。 实际上, train组件定义是正确的(相对于最初发布的内容进行了一些小调整); 但是管道缺少 GPU 定义 我将首先包括一个虚拟示例代码,它使用 spaCy 训练 NER model,并通过 Vertex AI 管道编排一切:

from kfp.v2 import compiler
from kfp.v2.dsl import pipeline, component, Dataset, Input, Output, OutputPath, InputPath
from datetime import datetime
from google.cloud import aiplatform
from typing import NamedTuple


# Component definition

@component(
    packages_to_install=[
        "setuptools",
        "wheel", 
        "spacy[cuda113,transformers,lookups]",
    ],
    base_image="gcr.io/deeplearning-platform-release/base-cu113",
    output_component_file="generate.yaml"
)
def generate_spacy_file(train_path: OutputPath(), dev_path: OutputPath()):
    """
    Generates a small, dummy 'train.spacy' & 'dev.spacy' file
    
    Returns:
    -------
    train_path : Relative location in GCS, for the "train.spacy" file.
    dev_path: Relative location in GCS, for the "dev.spacy" file.
    """
    import spacy
    from spacy.training import Example
    from spacy.tokens import DocBin

    td = [    # Train (dummy) dataset, in 'spacy V2 presentation'
              ("Walmart is a leading e-commerce company", {"entities": [(0, 7, "ORG")]}),
              ("I reached Chennai yesterday.", {"entities": [(19, 28, "GPE")]}),
              ("I recently ordered a book from Amazon", {"entities": [(24,32, "ORG")]}),
              ("I was driving a BMW", {"entities": [(16,19, "PRODUCT")]}),
              ("I ordered this from ShopClues", {"entities": [(20,29, "ORG")]}),
              ("Fridge can be ordered in Amazon ", {"entities": [(0,6, "PRODUCT")]}),
              ("I bought a new Washer", {"entities": [(16,22, "PRODUCT")]}),
              ("I bought a old table", {"entities": [(16,21, "PRODUCT")]}),
              ("I bought a fancy dress", {"entities": [(18,23, "PRODUCT")]}),
              ("I rented a camera", {"entities": [(12,18, "PRODUCT")]}),
              ("I rented a tent for our trip", {"entities": [(12,16, "PRODUCT")]}),
              ("I rented a screwdriver from our neighbour", {"entities": [(12,22, "PRODUCT")]}),
              ("I repaired my computer", {"entities": [(15,23, "PRODUCT")]}),
              ("I got my clock fixed", {"entities": [(16,21, "PRODUCT")]}),
              ("I got my truck fixed", {"entities": [(16,21, "PRODUCT")]}),
    ]
    
    dd = [    # Development (dummy) dataset (CV), in 'spacy V2 presentation'
              ("Flipkart started it's journey from zero", {"entities": [(0,8, "ORG")]}),
              ("I recently ordered from Max", {"entities": [(24,27, "ORG")]}),
              ("Flipkart is recognized as leader in market",{"entities": [(0,8, "ORG")]}),
              ("I recently ordered from Swiggy", {"entities": [(24,29, "ORG")]})
    ]

    
    # Converting Train & Development datasets, from 'spaCy V2' to 'spaCy V3'
    nlp = spacy.blank("en")
    db_train = DocBin()
    db_dev = DocBin()

    for text, annotations in td:
        example = Example.from_dict(nlp.make_doc(text), annotations)
        db_train.add(example.reference)
        
    for text, annotations in dd:
        example = Example.from_dict(nlp.make_doc(text), annotations)
        db_dev.add(example.reference)
    
    db_train.to_disk(train_path + ".spacy")  # <== Obtaining and storing "train.spacy"
    db_dev.to_disk(dev_path + ".spacy")      # <== Obtaining and storing "dev.spacy"
    

# ----------------------- ORIGINALLY POSTED CODE -----------------------

@component(
    packages_to_install=[
        "setuptools",
        "wheel", 
        "spacy[cuda113,transformers,lookups]",
    ],
    base_image="gcr.io/deeplearning-platform-release/base-cu113",
    output_component_file="train.yaml"
)
def train(train_path: InputPath(), dev_path: InputPath(), output_path: OutputPath()):
    """
    Trains a spacy model
    
    Parameters:
    ----------
    train_path : Relative location in GCS, for the "train.spacy" file.
    dev_path: Relative location in GCS, for the "dev.spacy" file.
    
    Returns:
    -------
    output : Destination path of the saved model.
    """
    import spacy
    import subprocess
    
    spacy.require_gpu()  # <=== IMAGE NOW MANAGES TO GET BUILT!

    # Presets for training
    subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])

    # Training model
    subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
                    "--output", output_path,
                    "--paths.train", "{}.spacy".format(train_path),
                    "--paths.dev", "{}.spacy".format(dev_path),
                    "--gpu-id", "0"])

# ----------------------------------------------------------------------
    

# Pipeline definition

@pipeline(
    pipeline_root=PIPELINE_ROOT,
    name="spacy-dummy-pipeline",
)
def spacy_pipeline():
    """
    Builds a custom pipeline
    """
    # Generating dummy "train.spacy" + "dev.spacy"
    train_dev_sets = generate_spacy_file()
    # With the output of the previous component, train a spaCy modeL    
    model = train(
        train_dev_sets.outputs["train_path"],
        train_dev_sets.outputs["dev_path"]
    
    # ------ !!! THIS SECTION DOES THE TRICK !!! ------
    ).add_node_selector_constraint(
        label_name="cloud.google.com/gke-accelerator",
        value="NVIDIA_TESLA_T4"
    ).set_gpu_limit(1).set_memory_limit('32G')
    # -------------------------------------------------

# Pipeline compilation   

compiler.Compiler().compile(
    pipeline_func=spacy_pipeline, package_path="pipeline_spacy_job.json"
)


# Pipeline run

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

run = aiplatform.PipelineJob(  # Include your own naming here
    display_name="spacy-dummy-pipeline",
    template_path="pipeline_spacy_job.json",
    job_id="ml-pipeline-spacydummy-small-{0}".format(TIMESTAMP),
    parameter_values={},
    enable_caching=True,
)


# Pipeline gets submitted

run.submit()

现在,解释; 根据谷歌

默认情况下,该组件将作为 Vertex AI CustomJob 使用 e2-standard-4 机器运行,该机器具有 4 核 CPU 和 16GB memory。

因此,当train组件被编译时,它会失败,因为“它没有看到任何 GPU 可用作资源”; 然而,在同一个链接中,提到了 CPU 和 GPU 的所有可用设置。 如您所见,在我的例子中,我将train组件设置为在一 (1) NVIDIA_TESLA_T4 GPU 卡下运行,并且我还将 CPU memory 增加到 32GB。 通过这些修改,生成的管道如下所示:

在此处输入图像描述

如您所见,它已成功编译,并训练(并最终获得)一个功能性的 spaCy model。从这里,您可以调整此代码以满足您自己的需要。

我希望这对任何可能感兴趣的人有所帮助。

谢谢你。

删除失败的行。 IE。 spacy.require_gpu() # <=== IMAGE FAILS TO BE COMPILED HERE

还要调整以删除 cuda 安装行cuda113,

您的代码设置为使用 GPU,但对于学习练习,您不需要 GPU。我不知道,您也不知道如何指定启用 GPU 的 python 顶点 AI gcp 实例。 因此删除了对 GPU 的要求。一旦代码运行,您可以返回 go 并调整以添加 GPU。

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/ /"
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-11-2
python -m spacy download en_core_web_trf # optional

在另一个单元格上安装其他 pip 包和依赖pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

指向正确的 cuda 文件夹export CUDA_PATH="/usr/local/cuda-11"

安装 spacy 变形金刚信息pip install -U spacy[cuda113,transformers]更多信息: pip install cupy-cuda113

现在库、数据包和单元已正确定位并安装,这应该可以工作

>>> import spacy
>>> spacy.require_gpu()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM