在 1-2 个 epoch 后具有三元组损失的特征学习产生 100% 的 val 准确度？

Question

My NN has to learn image similarity with a custom triplet loss.我的神经网络必须通过自定义三元组损失来学习图像相似度。 The positive image is similar to the anchor, while the negative is not.正面图像类似于锚点，而负面图像则不然。

My task is to predict whether the second image or the third image of an unseen triplet is more similar to the anchor or not.我的任务是预测看不见的三元组的第二张图像或第三张图像是否与锚更相似。

The triplets are given for both train and test sets in the task, so I did not have to mine them or randomly generate them: they are fixed in my task.任务中的训练集和测试集都给出了三元组，因此我不必挖掘它们或随机生成它们：它们在我的任务中是固定的。

---> Idea: To improve my model, I try to use feature learning with Xception layers frozen and adding a Dense layer on top. ---> 想法：为了改进我的 model，我尝试使用冻结 Xception 层的特征学习并在顶部添加一个 Dense 层。

Problem:问题：

When training the below model with Xception layers frozen, after 1-2 epochs it learns to just set all positive images to a very low distance to the anchor and all negative images to a very high distance.在训练下面的 model 并冻结 Xception 层时，经过 1-2 个 epoch 后，它学会了将所有正图像设置为与锚点的距离非常低，将所有负图像设置为非常高的距离。 Hence, the 100% val accuracy.因此，100% 的 val 准确度。

I immediately thought of overfitting but I only have one fully connected layer that I train?我立即想到了过度拟合，但我只有一个完全连接的层可以训练？ How can I combat this?我该如何对抗这个？ Or is my triplet loss somehow wrongly defined?还是我的三元组损失以某种方式错误定义？

I dont use data augmentation so could that potentially help?我不使用数据增强，所以这可能有帮助吗？

Somehow this happens only when using a pretrained model.不知何故，只有在使用预训练的 model 时才会发生这种情况。 When I use a simple model I get realistic accuracy...当我使用简单的 model 时，我得到了真实的准确度......

What am I missing here?我在这里想念什么？

My triplet loss :我的三胞胎损失：

def triplet_loss(y_true, y_pred, alpha = 0.4):
    """
    Implementation of the triplet loss function
    Arguments:
    y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor data
            positive -- the encodings for the positive data (similar to anchor)
            negative -- the encodings for the negative data (different from anchor)
    Returns:
    loss -- real number, value of the loss
    """

    total_length = y_pred.shape.as_list()[-1]

    anchor = y_pred[:,0:int(total_length*1/3)]
    positive = y_pred[:,int(total_length*1/3):int(total_length*2/3)]
    negative = y_pred[:,int(total_length*2/3):int(total_length*3/3)]

    # distance between the anchor and the positive
    pos_dist = K.sum(K.square(anchor-positive),axis=1)

    # distance between the anchor and the negative
    neg_dist = K.sum(K.square(anchor-negative),axis=1)

    # compute loss
    basic_loss = pos_dist-neg_dist+alpha
    loss = K.maximum(basic_loss,0.0)

    return loss

Then my model:然后我的 model：

def baseline_model():
    input_1 = Input(shape=(256, 256, 3))
    input_2 = Input(shape=(256, 256, 3))
    input_3 = Input(shape=(256, 256, 3))

    pretrained_model = Xception(include_top=False, weights="imagenet")

    for layer in pretrained_model.layers:
        layer.trainable = False

    x1 = pretrained_model(input_1)
    x2 = pretrained_model(input_2)
    x3 = pretrained_model(input_3)

    x1 = Flatten(name='flatten1')(x1)
    x2 = Flatten(name='flatten2')(x2) 
    x3 = Flatten(name='flatten3')(x3)

    x1 = Dense(128, activation=None,kernel_regularizer=l2(0.01))(x1)
    x2 = Dense(128, activation=None,kernel_regularizer=l2(0.01))(x2)
    x3 = Dense(128, activation=None,kernel_regularizer=l2(0.01))(x3)

    x1 = Lambda(lambda x: K.l2_normalize(x,axis=-1))(x1)
    x2 = Lambda(lambda x: K.l2_normalize(x,axis=-1))(x2)
    x3 = Lambda(lambda x: K.l2_normalize(x,axis=-1))(x3)

    concat_vector = concatenate([x1, x2, x3], axis=-1, name='concat')

    model = Model([input_1, input_2, input_3], concat_vector)

    model.compile(loss=triplet_loss, optimizer=Adam(0.00001), metrics=[accuracy])

    model.summary()

    return model

Fitting my model :安装我的 model ：

model.fit(
        gen(X_train,batch_size=batch_size),
        steps_per_epoch=13281 // batch_size,
        epochs=10,
        validation_data=gen(X_val,batch_size=batch_size),
        validation_steps=1666 // batch_size,
        verbose=1,
        callbacks=callbacks_list
        )
model.save_weights('try_6.h5')

Answer 1

Please note that you use different Dense layers for each input (you define 3 different Dense layers. each time you create a new Dense object it generate a new layer, with new parameters, independent of the previous layers you created).请注意，您为每个输入使用不同的 Dense 层（您定义了 3 个不同的 Dense 层。每次创建新的 Dense object 时，它都会生成一个具有新参数的新层，独立于您之前创建的层）。 If the input is consistent, meaning input 1 is always the anchor, input 2 is always the positive, and input 3 is always the negative - it will be super easy for the model to overfit.如果输入一致，这意味着输入 1 始终是锚点，输入 2 始终是正数，输入 3 始终是负数 - model 非常容易过拟合。 What you should probably do is use only a single Dense layer for all 3 inputs.您可能应该做的是对所有 3 个输入仅使用一个 Dense 层。

For example, based on your code you can define the model like this:例如，根据您的代码，您可以像这样定义 model：

pretrained_model = Xception(include_top=False, weights="imagenet")
for layer in pretrained_model.layers:
    layer.trainable = False

general_input = Input(shape=(256, 256, 3))
x = pretrained_model(general_input)
x = Flatten()(x)
x = Dense(128, activation=None,kernel_regularizer=l2(0.01))(x)

base_model = Model([general_input], [x])

input_1 = Input(shape=(256, 256, 3))
input_2 = Input(shape=(256, 256, 3))
input_3 = Input(shape=(256, 256, 3))

x1 = base_model(input_1)
x2 = base_model(input_2)
x3 = base_model(input_3)

# ... continue with your code - normalize, concat, etc.

在 1-2 个 epoch 后具有三元组损失的特征学习产生 100% 的 val 准确度？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-24 16:39:36

在 1-2 个 epoch 后具有三元组损失的特征学习产生 100% 的 val 准确度？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-24 16:39:36

解决方案1
1 已采纳 2020-05-24 16:39:36