[英]Model learns when using keras but doesn't with tf.keras
我正在使用的 model 是这样的:
from keras.layers import (Input, MaxPooling1D, Dropout,
BatchNormalization, Activation, Add,
Flatten, Conv1D, Dense)
from keras.models import Model
import numpy as np
class ResidualUnit(object):
"""References
----------
.. [1] K. He, X. Zhang, S. Ren, and J. Sun, "Identity Mappings in Deep Residual Networks,"
arXiv:1603.05027 [cs], Mar. 2016. https://arxiv.org/pdf/1603.05027.pdf.
.. [2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. https://arxiv.org/pdf/1512.03385.pdf
"""
def __init__(self, n_samples_out, n_filters_out, kernel_initializer='he_normal',
dropout_rate=0.8, kernel_size=17, preactivation=True,
postactivation_bn=False, activation_function='relu'):
self.n_samples_out = n_samples_out
self.n_filters_out = n_filters_out
self.kernel_initializer = kernel_initializer
self.dropout_rate = dropout_rate
self.kernel_size = kernel_size
self.preactivation = preactivation
self.postactivation_bn = postactivation_bn
self.activation_function = activation_function
def _skip_connection(self, y, downsample, n_filters_in):
"""Implement skip connection."""
# Deal with downsampling
if downsample > 1:
y = MaxPooling1D(downsample, strides=downsample, padding='same')(y)
elif downsample == 1:
y = y
else:
raise ValueError("Number of samples should always decrease.")
# Deal with n_filters dimension increase
if n_filters_in != self.n_filters_out:
# This is one of the two alternatives presented in ResNet paper
# Other option is to just fill the matrix with zeros.
y = Conv1D(self.n_filters_out, 1, padding='same',
use_bias=False,
kernel_initializer=self.kernel_initializer
)(y)
return y
def _batch_norm_plus_activation(self, x):
if self.postactivation_bn:
x = Activation(self.activation_function)(x)
x = BatchNormalization(center=False, scale=False)(x)
else:
x = BatchNormalization()(x)
x = Activation(self.activation_function)(x)
return x
def __call__(self, inputs):
"""Residual unit."""
x, y = inputs
n_samples_in = y.shape[1]
downsample = n_samples_in // self.n_samples_out
n_filters_in = y.shape[2]
y = self._skip_connection(y, downsample, n_filters_in)
# 1st layer
x = Conv1D(self.n_filters_out, self.kernel_size, padding='same',
use_bias=False,
kernel_initializer=self.kernel_initializer
)(x)
x = self._batch_norm_plus_activation(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
# 2nd layer
x = Conv1D(self.n_filters_out, self.kernel_size, strides=downsample,
padding='same', use_bias=False,
kernel_initializer=self.kernel_initializer
)(x)
if self.preactivation:
x = Add()([x, y]) # Sum skip connection and main connection
y = x
x = self._batch_norm_plus_activation(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
else:
x = BatchNormalization()(x)
x = Add()([x, y]) # Sum skip connection and main connection
x = Activation(self.activation_function)(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
y = x
return [x, y]
# ----- Model ----- #
kernel_size = 16
kernel_initializer = 'he_normal'
signal = Input(shape=(1000, 12), dtype=np.float32, name='signal')
age_range = Input(shape=(6,), dtype=np.float32, name='age_range')
is_male = Input(shape=(1,), dtype=np.float32, name='is_male')
x = signal
x = Conv1D(64, kernel_size, padding='same', use_bias=False,
kernel_initializer=kernel_initializer
)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x, y = ResidualUnit(512, 128, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, x])
x, y = ResidualUnit(256, 196, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, y])
x, y = ResidualUnit(64, 256, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, y])
x, _ = ResidualUnit(16, 320, kernel_size=kernel_size, kernel_initializer=kernel_initializer
)([x, y])
x = Flatten()(x)
diagn = Dense(2, activation='sigmoid', kernel_initializer=kernel_initializer)(x)
model = Model(signal, diagn)
model.summary()
# ----- Train ----- #
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
loss = 'binary_crossentropy'
lr = 0.001
batch_size = 64
opt = Adam(learning_rate=0.001)
callbacks = [ReduceLROnPlateau(monitor='val_loss',
factor=0.1,
patience=7,
min_lr=lr / 100)]
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=70,
initial_epoch=0,
validation_split=0.1,
shuffle='batch',
callbacks=callbacks,
verbose=1)
# Save final result
model.save("./final_model_middle_one.hdf5")
当我用 tf.keras(我需要使用 qkeras 库)代替 Keras 的使用时,model 不会在每次迭代时学习并陷入低得多的精度。 这可能是什么原因造成的?
当我使用 keras 时,准确率从 83% 开始很高,并且在训练期间略有增加。
Train on 17340 samples, validate on 1927 samples
Epoch 1/70
17340/17340 [==============================] - 33s 2ms/step - loss: 0.3908 - accuracy: 0.8314 - val_loss: 0.3283 - val_accuracy: 0.8710
Epoch 2/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3641 - accuracy: 0.8416 - val_loss: 0.3340 - val_accuracy: 0.8612
Epoch 3/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3525 - accuracy: 0.8483 - val_loss: 0.3847 - val_accuracy: 0.8550
Epoch 4/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3354 - accuracy: 0.8563 - val_loss: 0.4641 - val_accuracy: 0.8215
Epoch 5/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3269 - accuracy: 0.8590 - val_loss: 0.7172 - val_accuracy: 0.7870
Epoch 6/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3202 - accuracy: 0.8630 - val_loss: 0.3599 - val_accuracy: 0.8617
Epoch 7/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3101 - accuracy: 0.8678 - val_loss: 0.2659 - val_accuracy: 0.8934
Epoch 8/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3058 - accuracy: 0.8688 - val_loss: 0.5683 - val_accuracy: 0.8293
Epoch 9/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.2980 - accuracy: 0.8739 - val_loss: 0.3442 - val_accuracy: 0.8643
Epoch 10/70
7424/17340 [===========>..................] - ETA: 17s - loss: 0.2966 - accuracy: 0.8707
当我使用 tf.keras 时,准确度从 50% 开始,并且在训练期间没有显着增加:
Epoch 1/70
271/271 [==============================] - 30s 110ms/step - loss: 0.9325 - accuracy: 0.5093 - val_loss: 0.6973 - val_accuracy: 0.5470 - lr: 0.0010
Epoch 2/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8424 - accuracy: 0.5157 - val_loss: 0.6660 - val_accuracy: 0.6528 - lr: 0.0010
Epoch 3/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8066 - accuracy: 0.5213 - val_loss: 0.6441 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 4/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7884 - accuracy: 0.5272 - val_loss: 0.6649 - val_accuracy: 0.6559 - lr: 0.0010
Epoch 5/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7888 - accuracy: 0.5368 - val_loss: 0.6899 - val_accuracy: 0.5760 - lr: 0.0010
Epoch 6/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7617 - accuracy: 0.5304 - val_loss: 0.6641 - val_accuracy: 0.6533 - lr: 0.0010
Epoch 7/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7485 - accuracy: 0.5333 - val_loss: 0.6450 - val_accuracy: 0.6544 - lr: 0.0010
Epoch 8/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7431 - accuracy: 0.5382 - val_loss: 0.6599 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 9/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7336 - accuracy: 0.5421 - val_loss: 0.6532 - val_accuracy: 0.6554 - lr: 0.0010
Epoch 10/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7274 - accuracy: 0.5379 - val_loss: 0.6753 - val_accuracy: 0.6492 - lr: 0.0010
两次试验之间更改的行是我通过添加“tensorflow”导入 keras 模块的行。 在他们面前。 我不知道为什么结果会如此不同,可能是由于某些参数的默认值不同?
它可能与如何在keras
vs tf.keras
中计算准确度指标有关。 据我所知,当您使用单热编码 output 时,通常使用 function 的准确性。 但是,您似乎正在输出两个值[A, B]
,每个值都应用了sigmoid
function。 由于我不知道您使用的标签,可能有两种情况:
a)您想预测A or B
。 如果 sos 我会将激活 function 更改为softmax
b)您想在A or not A
和B or not B
与 B 之间进行预测。 在这种情况下,我将修改 output 张量形状以具有两个头,每个头都有两个值: head_A = [A, not_A]
和head_B = [B, not_B]
。 然后我会分别对标签进行热编码,然后我会假设你可以使用accuracy
指标。
或者,您可以创建适合您的 output 形状的自定义指标。
我有一个类似(相同?)的问题,我正在处理来自 Kaggle 的一些示例,并且无法使用 keras 保存 model。 经过多次谷歌搜索,我意识到我需要使用 tensorflow.keras。 这解决了我的问题,但是我拥有和用于训练的 60000 个数据项下降到报告的 1875。尽管错误仍然是 10%。
1875 * 32 = 60000。
这是我的适合。
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, verbose=True, callbacks=[early_stopping_monitor])
1539/1875 [=======================>......] - ETA: 3s - loss: 0.4445 - accuracy: 0.8418
事实证明,fit 默认批量大小为 32。如果我将批量大小增加到 64,我得到报告数据集的一半,这是有道理的:
model.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), epochs=epochs, verbose=True, callbacks=[early_stopping_monitor])
938/938 [==============================] - 16s 17ms/step - loss: 0.4568 - accuracy: 0.8388
我从您的代码中注意到您已将 batch_size 设置为 64,并且您报告的数据项从 17340 减少到 271,大约是第 64 位,由于您使用的数据,这也一定会影响您的准确性。
从这里的文档: https://www.tensorflow.org/api_docs/python/tf/keras/Sequential
批量大小
Integer 或无。 每次梯度更新的样本数。 如果未指定,batch_size 将默认为 32。如果您的数据是数据集、生成器或 keras.utils.Sequence 实例(因为它们生成批次)的形式,请不要指定 batch_size。
From the Keras docs: https://keras.rstudio.com/reference/fit.html , it also says that the batch size defaults to 32, it must just be reported differently when training the model.
希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.