在 Google Colab 中开始训练之前加载数据到 GPU

Question

I am using a subset of the PlantVillage (image) dataset on my Google drive and trying to train CNN models on that data from Google Colab (and of course, I use GPU).我在我的 Google 驱动器上使用 PlantVillage（图像）数据集的一个子集，并尝试根据来自 Google Colab 的数据训练 CNN 模型（当然，我使用 GPU）。 The problem is, the first epoch of training goes very slowly because the data is being loaded into the GPU for the first time.问题是，第一个训练周期非常缓慢，因为数据是第一次加载到 GPU 中。 the later rounds move much faster and in a predictable frame of time.后面的回合移动得更快，并且在可预测的时间范围内。 Now, is this possible to do the loading prior to the training and excluded from it?现在，是否可以在训练之前进行加载并将其排除在外？ I want to %%time my training time and having this extra loading time in my training messes things up.我想 %%time 我的训练时间，在我的训练中有这个额外的加载时间把事情搞砸了。

I use Tensorflow and Keras applications for data preprocessing and model training.我使用 Tensorflow 和 Keras 应用程序进行数据预处理和 model 培训。

Answer 1

You can use Dataset.cache() and Dataset.prefetch() which will keep the data in memory after loading from disk and will increase the model training speed comparatively.您可以使用Dataset.cache()和Dataset.prefetch() ，它们将从磁盘加载后将数据保留在 memory 中，并相对提高 model 的训练速度。

Check the below code:检查以下代码：

AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

Please have a look at this link for your reference.请查看此链接以供参考。

在 Google Colab 中开始训练之前加载数据到 GPU

问题描述

1 个解决方案

解决方案1
1 2022-11-14 16:31:17

在 Google Colab 中开始训练之前加载数据到 GPU

问题描述

1 个解决方案

解决方案1 1 2022-11-14 16:31:17

解决方案1
1 2022-11-14 16:31:17