Model 学习时使用 keras 但不使用 tf.keras

Question

我正在使用的 model 是这样的：

from keras.layers import (Input, MaxPooling1D, Dropout,
                          BatchNormalization, Activation, Add,
                          Flatten, Conv1D, Dense)
from keras.models import Model
import numpy as np


class ResidualUnit(object):
    """References
    ----------
    .. [1] K. He, X. Zhang, S. Ren, and J. Sun, "Identity Mappings in Deep Residual Networks,"
           arXiv:1603.05027 [cs], Mar. 2016. https://arxiv.org/pdf/1603.05027.pdf.
    .. [2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference
           on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. https://arxiv.org/pdf/1512.03385.pdf
    """

    def __init__(self, n_samples_out, n_filters_out, kernel_initializer='he_normal',
                 dropout_rate=0.8, kernel_size=17, preactivation=True,
                 postactivation_bn=False, activation_function='relu'):
        self.n_samples_out = n_samples_out
        self.n_filters_out = n_filters_out
        self.kernel_initializer = kernel_initializer
        self.dropout_rate = dropout_rate
        self.kernel_size = kernel_size
        self.preactivation = preactivation
        self.postactivation_bn = postactivation_bn
        self.activation_function = activation_function

    def _skip_connection(self, y, downsample, n_filters_in):
        """Implement skip connection."""
        # Deal with downsampling
        if downsample > 1:
            y = MaxPooling1D(downsample, strides=downsample, padding='same')(y)
        elif downsample == 1:
            y = y
        else:
            raise ValueError("Number of samples should always decrease.")
        # Deal with n_filters dimension increase
        if n_filters_in != self.n_filters_out:
            # This is one of the two alternatives presented in ResNet paper
            # Other option is to just fill the matrix with zeros.
            y = Conv1D(self.n_filters_out, 1, padding='same',
                       use_bias=False,
                       kernel_initializer=self.kernel_initializer
                       )(y)
            return y

    def _batch_norm_plus_activation(self, x):
        if self.postactivation_bn:
            x = Activation(self.activation_function)(x)
            x = BatchNormalization(center=False, scale=False)(x)
        else:
            x = BatchNormalization()(x)
            x = Activation(self.activation_function)(x)
        return x

    def __call__(self, inputs):
        """Residual unit."""
        x, y = inputs
        n_samples_in = y.shape[1]
        downsample = n_samples_in // self.n_samples_out
        n_filters_in = y.shape[2]
        y = self._skip_connection(y, downsample, n_filters_in)
        # 1st layer
        x = Conv1D(self.n_filters_out, self.kernel_size, padding='same',
                   use_bias=False,
                   kernel_initializer=self.kernel_initializer
                   )(x)
        x = self._batch_norm_plus_activation(x)
        if self.dropout_rate > 0:
            x = Dropout(self.dropout_rate)(x)

        # 2nd layer
        x = Conv1D(self.n_filters_out, self.kernel_size, strides=downsample,
                   padding='same', use_bias=False,
                   kernel_initializer=self.kernel_initializer
                   )(x)
        if self.preactivation:
            x = Add()([x, y])  # Sum skip connection and main connection
            y = x
            x = self._batch_norm_plus_activation(x)
            if self.dropout_rate > 0:
                x = Dropout(self.dropout_rate)(x)
        else:
            x = BatchNormalization()(x)
            x = Add()([x, y])  # Sum skip connection and main connection
            x = Activation(self.activation_function)(x)
            if self.dropout_rate > 0:
                x = Dropout(self.dropout_rate)(x)
            y = x
        return [x, y]


# ----- Model ----- #
kernel_size = 16
kernel_initializer = 'he_normal'
signal = Input(shape=(1000, 12), dtype=np.float32, name='signal')
age_range = Input(shape=(6,), dtype=np.float32, name='age_range')
is_male = Input(shape=(1,), dtype=np.float32, name='is_male')
x = signal
x = Conv1D(64, kernel_size, padding='same', use_bias=False,
           kernel_initializer=kernel_initializer
           )(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x, y = ResidualUnit(512, 128, kernel_size=kernel_size,
                    kernel_initializer=kernel_initializer
                    )([x, x])
x, y = ResidualUnit(256, 196, kernel_size=kernel_size,
                    kernel_initializer=kernel_initializer
                    )([x, y])
x, y = ResidualUnit(64, 256, kernel_size=kernel_size,
                    kernel_initializer=kernel_initializer
                    )([x, y])
x, _ = ResidualUnit(16, 320, kernel_size=kernel_size, kernel_initializer=kernel_initializer
                    )([x, y])
x = Flatten()(x)
diagn = Dense(2, activation='sigmoid', kernel_initializer=kernel_initializer)(x)
model = Model(signal, diagn)

model.summary()


# ----- Train ----- #

from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau

loss = 'binary_crossentropy'
lr = 0.001
batch_size = 64
opt = Adam(learning_rate=0.001)
callbacks = [ReduceLROnPlateau(monitor='val_loss',
                                factor=0.1,
                                patience=7,
                                min_lr=lr / 100)]

model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=70,
                    initial_epoch=0, 
                    validation_split=0.1,
                    shuffle='batch', 
                    callbacks=callbacks,
                    verbose=1)

# Save final result
model.save("./final_model_middle_one.hdf5")

当我用 tf.keras（我需要使用 qkeras 库）代替 Keras 的使用时，model 不会在每次迭代时学习并陷入低得多的精度。 这可能是什么原因造成的？

当我使用 keras 时，准确率从 83% 开始很高，并且在训练期间略有增加。

Train on 17340 samples, validate on 1927 samples
Epoch 1/70
17340/17340 [==============================] - 33s 2ms/step - loss: 0.3908 - accuracy: 0.8314 - val_loss: 0.3283 - val_accuracy: 0.8710
Epoch 2/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3641 - accuracy: 0.8416 - val_loss: 0.3340 - val_accuracy: 0.8612
Epoch 3/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3525 - accuracy: 0.8483 - val_loss: 0.3847 - val_accuracy: 0.8550
Epoch 4/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3354 - accuracy: 0.8563 - val_loss: 0.4641 - val_accuracy: 0.8215
Epoch 5/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3269 - accuracy: 0.8590 - val_loss: 0.7172 - val_accuracy: 0.7870
Epoch 6/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3202 - accuracy: 0.8630 - val_loss: 0.3599 - val_accuracy: 0.8617
Epoch 7/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3101 - accuracy: 0.8678 - val_loss: 0.2659 - val_accuracy: 0.8934
Epoch 8/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3058 - accuracy: 0.8688 - val_loss: 0.5683 - val_accuracy: 0.8293
Epoch 9/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.2980 - accuracy: 0.8739 - val_loss: 0.3442 - val_accuracy: 0.8643
Epoch 10/70
 7424/17340 [===========>..................] - ETA: 17s - loss: 0.2966 - accuracy: 0.8707

当我使用 tf.keras 时，准确度从 50% 开始，并且在训练期间没有显着增加：

Epoch 1/70
271/271 [==============================] - 30s 110ms/step - loss: 0.9325 - accuracy: 0.5093 - val_loss: 0.6973 - val_accuracy: 0.5470 - lr: 0.0010
Epoch 2/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8424 - accuracy: 0.5157 - val_loss: 0.6660 - val_accuracy: 0.6528 - lr: 0.0010
Epoch 3/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8066 - accuracy: 0.5213 - val_loss: 0.6441 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 4/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7884 - accuracy: 0.5272 - val_loss: 0.6649 - val_accuracy: 0.6559 - lr: 0.0010
Epoch 5/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7888 - accuracy: 0.5368 - val_loss: 0.6899 - val_accuracy: 0.5760 - lr: 0.0010
Epoch 6/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7617 - accuracy: 0.5304 - val_loss: 0.6641 - val_accuracy: 0.6533 - lr: 0.0010
Epoch 7/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7485 - accuracy: 0.5333 - val_loss: 0.6450 - val_accuracy: 0.6544 - lr: 0.0010
Epoch 8/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7431 - accuracy: 0.5382 - val_loss: 0.6599 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 9/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7336 - accuracy: 0.5421 - val_loss: 0.6532 - val_accuracy: 0.6554 - lr: 0.0010
Epoch 10/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7274 - accuracy: 0.5379 - val_loss: 0.6753 - val_accuracy: 0.6492 - lr: 0.0010

两次试验之间更改的行是我通过添加“tensorflow”导入 keras 模块的行。 在他们面前。 我不知道为什么结果会如此不同，可能是由于某些参数的默认值不同？

Answer 1

它可能与如何在keras vs tf.keras中计算准确度指标有关。 据我所知，当您使用单热编码 output 时，通常使用 function 的准确性。 但是，您似乎正在输出两个值[A, B] ，每个值都应用了sigmoid function。 由于我不知道您使用的标签，可能有两种情况：

a）您想预测A or B 。 如果 sos 我会将激活 function 更改为softmax

b）您想在A or not A和B or not B与 B 之间进行预测。 在这种情况下，我将修改 output 张量形状以具有两个头，每个头都有两个值： head_A = [A, not_A]和head_B = [B, not_B] 。 然后我会分别对标签进行热编码，然后我会假设你可以使用accuracy指标。

或者，您可以创建适合您的 output 形状的自定义指标。

Answer 2

我有一个类似（相同？）的问题，我正在处理来自 Kaggle 的一些示例，并且无法使用 keras 保存 model。 经过多次谷歌搜索，我意识到我需要使用 tensorflow.keras。 这解决了我的问题，但是我拥有和用于训练的 60000 个数据项下降到报告的 1875。尽管错误仍然是 10%。

1875 * 32 = 60000。

这是我的适合。

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, verbose=True, callbacks=[early_stopping_monitor])

1539/1875 [=======================>......] - ETA: 3s - loss: 0.4445 - accuracy: 0.8418

事实证明，fit 默认批量大小为 32。如果我将批量大小增加到 64，我得到报告数据集的一半，这是有道理的：

model.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), epochs=epochs, verbose=True, callbacks=[early_stopping_monitor])

938/938 [==============================] - 16s 17ms/step - loss: 0.4568 - accuracy: 0.8388

我从您的代码中注意到您已将 batch_size 设置为 64，并且您报告的数据项从 17340 减少到 271，大约是第 64 位，由于您使用的数据，这也一定会影响您的准确性。

从这里的文档： https://www.tensorflow.org/api_docs/python/tf/keras/Sequential

批量大小
Integer 或无。 每次梯度更新的样本数。 如果未指定，batch_size 将默认为 32。如果您的数据是数据集、生成器或 keras.utils.Sequence 实例（因为它们生成批次）的形式，请不要指定 batch_size。

From the Keras docs: https://keras.rstudio.com/reference/fit.html , it also says that the batch size defaults to 32, it must just be reported differently when training the model.

希望这可以帮助。

Model 学习时使用 keras 但不使用 tf.keras

问题描述

2 个解决方案

解决方案1
1 2020-06-18 12:56:22

解决方案2
0 2020-06-18 12:28:11

Model 学习时使用 keras 但不使用 tf.keras

问题描述

2 个解决方案

解决方案1 1 2020-06-18 12:56:22

解决方案2 0 2020-06-18 12:28:11

解决方案1
1 2020-06-18 12:56:22

解决方案2
0 2020-06-18 12:28:11