简体   繁体   English

迁移学习可训练 Model 保存时抛出错误

[英]Transfer Learning Trainable Model Throws Errors On saving

I have downloaded strong text a pretrained model, and im trying to transfer learn it.我已经下载了经过预训练的 model 的强文本,我正在尝试转移学习它。 therefore I'm loading the model which is saved as a 'xray_model.h5' file, and set it as untrainable:因此我正在加载 model ,它保存为“xray_model.h5”文件,并将其设置为不可训练:

model = tf.keras.models.load_model('xray_model.h5')
model.trainable = False

later I take the start layer and end layer and build my addings on it:稍后我采用起始层和结束层并在其上构建我的添加:

base_input = model.layers[0].input
base_output = model.get_layer(name="flatten").output

base_output = build_model()(base_output)

new_model = keras.Model(inputs=base_input, outputs=base_output)

since I want to train my layers (and after some games, I realized that I might need to train the old layers too) I want to set the model as trainable:因为我想训练我的层(在一些游戏之后,我意识到我可能也需要训练旧层)我想将 model 设置为可训练:

for i in range(len(new_model.layers)):
    new_model._layers[i].trainable = True

BUT, when I start training it, with the callback:但是,当我开始训练它时,回调:

METRICS = ['accuracy',
           tf.keras.metrics.Precision(name='precision'),
           tf.keras.metrics.Recall(name='recall'),
           lr_metric]

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.00001, verbose=1)

save_callback = tf.keras.callbacks.ModelCheckpoint("new_xray_model.h5",
                                                   save_best_only=True,
                                                   monitor='accuracy')
history = new_model.fit(train_generator,
                        verbose=1,
                        steps_per_epoch=BATCH_SIZE,
                        epochs=EPOCHS,
                        validation_data=test_generator,
                        callbacks=[save_callback, reduce_lr])

I get the next error:我得到下一个错误:

File "C:\Users\jm10o\AppData\Local\Programs\Python\Python38\lib\site-packages\h5py\_hl\group.py", line 373, in __setitem__
    h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)

Process finished with exit code 1

I noticed that it happens only when I'm trying to further train the model which I loaded.我注意到只有在我尝试进一步训练我加载的 model 时才会发生这种情况。 I couldn't find any solution for it.我找不到任何解决方案。

The problem came from the Model_checkpoint callback.问题来自 Model_checkpoint 回调。 for each epoch, you save the model with the same name.对于每个 epoch,您使用相同的名称保存 model。

use the following format使用以下格式

ModelCheckpoint('your_model_name{epoch:0d}.h5',
                    monitor='accuracy')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM