简体   繁体   English

提高 Siamese 网络的准确性

[英]Improve Accuracy for a Siamese Network

I wrote this little model using Keras Functional API to find similarity of a dialogue between two individuals.我使用 Keras Functional API 编写了这个小模型,以查找两个人之间对话的相似性。 I am using Gensim's Doc2Vec embeddings for transforming text-data into vectors (vocab size: 4117).我正在使用 Gensim 的 Doc2Vec 嵌入将文本数据转换为向量(词汇大小:4117)。 My data is equally divided up into 56 positive cases and 64 negative cases.我的数据平均分为 56 个正面案例和 64 个负面案例。 (yes I know the dataset is small - but that's all I have for the time being). (是的,我知道数据集很小 - 但这就是我目前所拥有的)。

def euclidean_distance(vects):
    x, y = vects
    sum_square = K.sum(K.square(x - y), axis=1, keepdims=True)
    return K.sqrt(K.maximum(sum_square, K.epsilon()))

ch_inp = Input(shape=(38, 200))
csr_inp = Input(shape=(38, 200))

inp = Input(shape=(38, 200))
net = Embedding(int(vocab_size), 16)(inp)
net = Conv2D(16, 1, activation='relu')(net)
net = TimeDistributed(LSTM(8, return_sequences=True))(net)
out = Activation('relu')(net)

sia = Model(inp, out)

x = sia(csr_inp)
y = sia(ch_inp)

sub = Subtract()([x, y])
mul = Multiply()([sub, sub])

mul_x = Multiply()([x, x])
mul_y = Multiply()([y, y])
sub_xy = Subtract()([x, y])

euc = Lambda(euclidean_distance)([x, y])
z = Concatenate(axis=-1)([euc, sub_xy, mul])
z = TimeDistributed(Bidirectional(LSTM(4)))(z)
z = Activation('relu')(z)
z = GlobalMaxPooling1D()(z)
z = Dense(2, activation='relu')(z)
out = Dense(1, activation = 'sigmoid')(z)

model = Model([ch_inp, csr_inp], out)
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

The problem is: my accuracy won't improve from 60.87% - I ran 10 epochs and the accuracy remains constant.问题是:我的准确率不会从 60.87% 提高 - 我跑了 10 个 epoch 并且准确率保持不变。 Is there something I've done here in my code that's causing that?有什么我在我的代码中做过的事情导致了这种情况吗? Or perhaps its an issue with my data?或者可能是我的数据有问题?

在此处查看训练准确率和损失

I also did K-Fold Validation for some Sklearn models and got these results from the dataset:我还对一些 Sklearn 模型进行了 K-Fold 验证,并从数据集中得到了这些结果:

统计模型结果

Additionally, an overview of my dataset is attached below:此外,我的数据集概述如下:

数据帧快照

I'm definitely struggling with this one - so literally any help here would be appreciated.我肯定在为这个而苦苦挣扎 - 所以从字面上看,这里的任何帮助都将不胜感激。 Thanks!谢谢!

UPDATE: I increased my data-size to 1875 train-samples.更新:我将数据大小增加到 1875 个训练样本。 Its accuracy improved to 70.28%.其准确率提高到 70.28%。 But its still constant over all iterations.但它在所有迭代中仍然保持不变。

I see two things that may be important there.我认为有两件事可能很重要。

  • You're using 'relu' after the LSTM .您在LSTM之后使用'relu' An LSTM in Keras already has 'tanh' as default activation. LSTM中的LSTM已经将'tanh'作为默认激活。 So, although you're not locking your model, you're making it harder for it to learn, with an activation that constraints the results between as small range plus one that cuts the negative values所以,虽然你没有锁定你的模型,但你让它更难学习,激活将结果限制在小范围加一个减少负值之间

  • You're using 'relu' with very few units!您正在使用很少单位的'relu' Relu with few units, bad initialization, big learning rates and bad luck will get stuck in the zero region without any gradients.单元少、初始化不好、学习率大和运气不好的 Relu 将卡在没有任何梯度的零区域。

If your loss completely freezes, it's most probably due to the second point above.如果您的损失完全冻结,则很可能是由于上述第二点。 And even if it doesn't freeze, it may be using just one unit from the 2 Dense units, for instance, making the layer very poor.即使它没有冻结,它也可能只使用了 2 个 Dense 单位中的一个单位,例如,使图层非常差。

You should do something from below:你应该从下面做一些事情:

  • Your model is small, so quit using 'relu' and use 'tanh' instead.您的模型很小,因此请停止使用'relu'并改用'tanh' This will give your model the expected power it should have.这将为您的模型提供应有的预期功率。
  • Otherwise, you should definitely increase the number of units, both for the LSTM and for the Dense , so 'relu' doesn't get easily stuck.否则,您绝对应该增加LSTMDense的单元数,这样'relu'就不会轻易卡住。
  • You can add a BatchNormalization layer after Dense and before 'relu' , this way you guarantee that a good amount units will always be above zero.您可以Dense之后和'relu'之前添加BatchNormalization层,这样您就可以保证大量单位始终高于零。

In any case, don't use 'relu' after the LSTM .无论如何,不​​要在LSTM之后使用'relu'


The other approach would be making the model more powerful.另一种方法是使模型更强大。

For instance:例如:

z = TimeDistributed(Bidirectional(LSTM(4)))(z)
z = Conv1D(10, 3, activation = 'tanh')(z) #or 'relu' maybe
z = MaxPooling1D(z)
z = Conv1D(15, 3, activation = 'tanh')(z) #or 'relu' maybe
z = Flatten()(z) #unless the length is variable, then GlobalAveragePooling1D()(z)
z = Dense(10, activation='relu')(z)
out = Dense(1, activation = 'sigmoid')(z)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM