[英]Training on GPU much slower than on CPU - why and how to speed it up?
我正在使用 Google Colab 的 CPU 和 GPU 训练卷积神经网络。
这是网络的架构:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 62, 126, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 31, 63, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 29, 61, 32) 9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 30, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 12, 28, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 14, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 4, 12, 64) 36928
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 6, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 768) 0
_________________________________________________________________
dropout (Dropout) (None, 768) 0
_________________________________________________________________
lambda (Lambda) (None, 1, 768) 0
_________________________________________________________________
dense (Dense) (None, 1, 256) 196864
_________________________________________________________________
dense_1 (Dense) (None, 1, 8) 2056
_________________________________________________________________
permute (Permute) (None, 8, 1) 0
_________________________________________________________________
dense_2 (Dense) (None, 8, 36) 72
=================================================================
Total params: 264,560
Trainable params: 264,560
Non-trainable params: 0
所以,这是一个非常小的网络,但是一个特定的 output,形状(8, 36)
因为我想识别车牌图像上的字符。
我用这段代码来训练网络:
model.fit_generator(generator=training_generator,
validation_data=validation_generator,
steps_per_epoch = num_train_samples // 128,
validation_steps = num_val_samples // 128,
epochs = 10)
生成器将图像大小调整为(64, 128)
。 这是关于生成器的代码:
class DataGenerator(Sequence):
def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return math.ceil(len(self.x) / self.batch_size)
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) *
self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) *
self.batch_size]
return np.array([
resize(imread(file_name), (64, 128))
for file_name in batch_x]), np.array(batch_y)
在 CPU 上,一个 epoch 需要 70-90 分钟。 在 GPU(149 瓦)上,它需要的时间是 CPU 上的 5 倍。
编辑:这是我笔记本的链接: https://colab.research.google.com/drive/1ux9E8DhxPxtgaV60WUiYI2ew2s74Xrwh?usp=sharing
我的数据存储在我的 Google Drive 中。 训练数据集包含 105 k 图像和验证数据集 76 k。 总而言之,我有 1.8 GB 的数据。
我应该将数据存储在另一个地方吗?
非常感谢!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.