[英]How to load images from Vertex AI managed dataset inside Python training code?
I am trying to create a custom training job in Vertex AI.我正在尝试在 Vertex AI 中创建自定义训练作业。 I created a managed dataset stored in the same bucket I am exporting the training code to.
我创建了一个托管数据集,存储在我将训练代码导出到的同一存储桶中。 I have a Python code that looks like this:
我有一个 Python 代码,如下所示:
#Defining paths
TRAIN_PATH = os.environ['AIP_TRAINING_DATA_URI']
VAL_PATH = os.environ['AIP_VALIDATION_DATA_URI']
#skipped model definition#
train_datagen = image.ImageDataGenerator(rescale = 1./255, shear_range = 0.2,zoom_range = 0.2, horizontal_flip = True)
test_dataset = image.ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
TRAIN_PATH,
target_size = (224,224),
batch_size = 32,
class_mode = 'binary')
validation_generator = test_dataset.flow_from_directory(
VAL_PATH,
target_size = (224,224),
batch_size = 32,
class_mode = 'binary')
hist_new = model.fit(
train_generator, ...)
The question is, how do I load the images so the ImageDataGenerator can use them?问题是,如何加载图像以便 ImageDataGenerator 可以使用它们? The error I get when starting the training job is:
我开始训练工作时遇到的错误是:
No such file or directory: 'gs://(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'
It seems that the TRAIN_PATH
and VAL_PATH
should be valid local paths (this TF documentation does not mention other paths) and not GCS URIs.似乎
TRAIN_PATH
和VAL_PATH
应该是有效的本地路径(此 TF 文档未提及其他路径)而不是 GCS URI。 The data set is made available at the specified GCS URI but the training code should download the data set images from GCS to the local environment and then pass them to the ImageDataGenerator
.数据集在指定的 GCS URI 处可用,但训练代码应将数据集图像从 GCS 下载到本地环境,然后将它们传递给
ImageDataGenerator
。
For information on downloading the data set from GCS, refer to this documentation .有关从 GCS 下载数据集的信息,请参阅此文档。
In case you are using custom training container on Vertex, you can use gcs uris with FUSE filesystem , you don't have to do the mounting yourself, Vertex platform is taking care of that when you run a CustomJob simply read your paths as files:如果您在 Vertex 上使用自定义训练容器,您可以将 gcs uris 与FUSE filesystem一起使用,您不必自己进行安装,当您运行 CustomJob 时,Vertex 平台会处理这一点,只需将路径读取为文件:
'/gcs/(bucket name)/dataset-5820440723492700160-image_classification_multi_label-2022-05-29T10:53:33.245485Z/training-*'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.