简体   繁体   中英

Amazon Web Services P3 slower than local GPU with Keras, TensorFlow and MobileNet

I'm currently training (fine tuning) a pretrained MobileNet model with keras and tensorflow. The training is done on my local computer with a GTX980.

To speed up things I created a p3.2xlarge instance on AWS with an Amazon Deep Learning AMI based on Ubuntu ( aws marketplace ).

When running with some test data (~ 300 images) I noticed that my local computer needs around 10 seconds per epoch while aws needs 26 seconds. I even tested it with a p3.16xlarge instance, but no big difference. When watching the GPU(s) with

watch -n 1 nvidia-smi 

all the memory (16GB per GPU) was filled. I tried different amount of data, keras implementations, batch sizes and raising GPU speed. When listing the devices, the GPU is shown as used. What could be the problem for running slow? I am using jupyter notebook. Below my test code:

from keras.applications import MobileNet
mobile_model = MobileNet()

for layer in mobile_model.layers[:-4]:
    layer.trainable = False

from keras import models
from keras import layers
from keras import optimizers

# Create the model
model = models.Sequential()

# Add the vgg convolutional base model
model.add(mobile_model)

# Add new layers
#model.add(layers.Flatten(return_sequences=True))
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(2, activation='softmax'))

from keras.preprocessing.image import ImageDataGenerator

train_dir = "./painOrNoPain/train/"
validation_dir = "./painOrNoPain/valid/"
image_size = 224

train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=20,
      width_shift_range=0.2,
      height_shift_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale=1./255)

# Change the batchsize according to your system RAM
train_batchsize = 128
val_batchsize = 128

train_generator = train_datagen.flow_from_directory(
        train_dir,
        target_size=(image_size, image_size),
        batch_size=train_batchsize,
        class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
        validation_dir,
        target_size=(image_size, image_size),
        batch_size=val_batchsize,
        class_mode='categorical',
        shuffle=False)

try:
    model = multi_gpu_model(model)
except:
    pass

from keras.optimizers import Adam
# Compile the model
model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=1e-4),
              metrics=['acc'])
# Train the model
history = model.fit_generator(
      train_generator,
      steps_per_epoch=train_generator.samples/train_generator.batch_size ,
      epochs=3,
      validation_data=validation_generator,
      validation_steps=validation_generator.samples/validation_generator.batch_size,
      verbose=2)

# Save the model
model.save('small_last4.h5')

I'm having the same problem. Running with Amazon AWS GPU is even slower than running on CPU on my own laptop. A possible explanation is due to the large amount of time spent on data transferring between CPU and GPU as said in this post

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM