简体   繁体   English

如何从 Python 训练代码中的 Vertex AI 托管数据集中加载图像?

[英]How to load images from Vertex AI managed dataset inside Python training code?

I am trying to create a custom training job in Vertex AI.我正在尝试在 Vertex AI 中创建自定义训练作业 I created a managed dataset stored in the same bucket I am exporting the training code to.我创建了一个托管数据集,存储在我将训练代码导出到的同一存储桶中。 I have a Python code that looks like this:我有一个 Python 代码,如下所示:

#Defining paths 
TRAIN_PATH = os.environ['AIP_TRAINING_DATA_URI']
VAL_PATH = os.environ['AIP_VALIDATION_DATA_URI']

#skipped model definition#

train_datagen = image.ImageDataGenerator(rescale = 1./255, shear_range = 0.2,zoom_range = 0.2, horizontal_flip = True)

test_dataset = image.ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    TRAIN_PATH,
    target_size = (224,224),
    batch_size = 32,
    class_mode = 'binary')
validation_generator = test_dataset.flow_from_directory(
    VAL_PATH,
    target_size = (224,224),
    batch_size = 32,
    class_mode = 'binary')

hist_new = model.fit(
     train_generator, ...)

The question is, how do I load the images so the ImageDataGenerator can use them?问题是,如何加载图像以便 ImageDataGenerator 可以使用它们? The error I get when starting the training job is:我开始训练工作时遇到的错误是:

 No such file or directory: 'gs://(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'

It seems that the TRAIN_PATH and VAL_PATH should be valid local paths (this TF documentation does not mention other paths) and not GCS URIs.似乎TRAIN_PATHVAL_PATH应该是有效的本地路径(此 TF 文档未提及其他路径)而不是 GCS URI。 The data set is made available at the specified GCS URI but the training code should download the data set images from GCS to the local environment and then pass them to the ImageDataGenerator .数据集在指定的 GCS URI 处可用,但训练代码应将数据集图像从 GCS 下载到本地环境,然后将它们传递给ImageDataGenerator

For information on downloading the data set from GCS, refer to this documentation .有关从 GCS 下载数据集的信息,请参阅此文档

In case you are using custom training container on Vertex, you can use gcs uris with FUSE filesystem , you don't have to do the mounting yourself, Vertex platform is taking care of that when you run a CustomJob simply read your paths as files:如果您在 Vertex 上使用自定义训练容器,您可以将 gcs uris 与FUSE filesystem一起使用,您不必自己进行安装,当您运行 CustomJob 时,Vertex 平台会处理这一点,只需将路径读取为文件:

'/gcs/(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 顶点 AI 中的数据集版本控制 - dataset versioning in vertex AI 如何在 Vertex AI 笔记本电脑上更新 Python? - How to update Python on Vertex AI notebooks? 将 spaCy model 训练为 Vertex AI 管道“组件” - Training spaCy model as a Vertex AI Pipeline "Component" 有什么方法可以在 GCP Vertex AI 工作台中以编程方式执行托管笔记本? - Any way to programatically execute a Managed notebooks inside GCP Vertex AI workbench? 从 GCP Vertex AI Workbench 中的托管笔记本单元运行 !docker build - Run !docker build from Managed Notebook cell in GCP Vertex AI Workbench 在 Google Cloud Vertex AI 中为具有单标签图像的数据集创建导入文件 - Create import file for a dataset with single-label images in Google Cloud Vertex AI 将 JSONL 数据集导入 Vertex AI 时出错 - Error importing JSONL dataset into Vertex AI 从 python 后端调用 Google Vertex AI 端点时出错 - Error call Google Vertex AI endpoint from a python backend GCP Vertex AI 托管笔记本无法使用自定义容器 - GCP Vertex AI Managed Notebook cannot use custom container 使用自定义训练容器和 model 服务容器构建 Vertex AI 管道 - Constructing a Vertex AI Pipeline with a custom training container and a model serving container
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM