简体   繁体   中英

Saving and loading some models takes a very long time in Keras

I've noticed that when doing the following workflow:

  1. load a pre-trained model from keras.applications with weights from ImageNet
  2. fine-train this model with new data
  3. save the fine-tuned model to an hdf5 file with model.save('file.h5')
  4. re-load the model somewhere else with load_model('file.h5')

The saving and loading steps can take a really long time when using some models.

When using VGG16 or VGG19 or MobileNet, saving and loading happen very quickly (a few seconds at most).

However when using NasNet, InceptionV3 or DenseNet121 then both saving and loading can take up to 10 to 30 minutes each, as illustrated in the following examples:

from keras.layers import GlobalAveragePooling2D
from keras.layers.core import Dense
from keras.models import Model

# VGG16
model_ = keras.applications.vgg16.VGG16(weights='imagenet', include_top=False)
x = GlobalAveragePooling2D()(model_.output)
x = Dense(16, activation='softmax')(x)
my_model = Model(inputs=model_.input, outputs=x)
my_model.fit(some_data)
my_model.save('file.h5') # takes 2 seconds
load_model('file.h5') # takes 2 seconds

# NASNetMobile
model_ = keras.applications.nasnet.NASNetMobile(weights='imagenet', include_top=False)
x = GlobalAveragePooling2D()(model_.output)
x = Dense(16, activation='softmax')(x)
my_model = Model(inputs=model_.input, outputs=x)
my_model.fit(some_data)
my_model.save('file.h5') # takes 10 minutes
load_model('file.h5') # takes 5 minutes

# DenseNet121
model_ = keras.applications.densenet.DenseNet121(weights='imagenet', include_top=False)
x = GlobalAveragePooling2D()(model_.output)
x = Dense(16, activation='softmax')(x)
my_model = Model(inputs=model_.input, outputs=x)
my_model.fit(some_data)
my_model.save('file.h5') # takes 10 minutes
load_model('file.h5') # takes 5 minutes

When querying the command line to monitor the file being created while saving, we can see file.h5 being slowly built up, at around 100Kb per minute in the worst cases, and then suddenly when it reaches 22Mb it very quickly completes to the full size (80-100Mb depending on the model).

I was wondering if that's "standard behaviour", just because these models are inherently complex and then such long saving/loading durations are expected, or is it not normal? Also, can something be done to mitigate this?

Configuration used:

  • Keras 2.2 with TensorFlow backend
  • TensorFlow-GPU 1.13
  • Python 3.6
  • CUDA 10.1
  • running on an AWS Deep Learning EC2 pre-configured instance

I'm having a similar experience trying to load a ResNet 50 in TF 2.0 and Keras. Not sure what's up, but I see 100% CPU utilization on a single-core (out of 64 available CPU cores).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM