简体   繁体   English

特征相似性的连体网络

[英]Siamese network for feature similarity

I have around 20k images of different domains with the features already extracted using GLCM and HOG.我有大约 20k 个不同域的图像,这些图像已经使用 GLCM 和 HOG 提取了特征。 The dimensions of features are around 2000 for each image.每张图像的特征尺寸约为 2000。 I want to find similarity between features using Siamese network.I stored all in a dataframe.我想使用连体网络找到特征之间的相似性。我将所有内容都存储在 dataframe 中。 I'm not sure how we can give input features to neural net.我不确定我们如何为神经网络提供输入特征。 There is only one possibilty of using 1DCNN / Dense layers.使用 1DCNN / Dense 层只有一种可能性。

encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(units=1024, activation=activations.relu, input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation=activations.relu))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation=activations.relu))
encoder.add(layers.Dropout(0.1))

In this above code we only we give number of features as input to encoder, But number of features for my both images are same.在上面的代码中,我们只将特征数量作为编码器的输入,但我的两张图像的特征数量是相同的。 Should I train two encoders separately and join them at the end to form a embedding layer?我应该分别训练两个编码器并在最后加入它们以形成嵌入层吗? But how should I test?但是我应该如何测试?

For a siamese network you would want to have one network, and train it on different sets of data.对于连体网络,您可能希望拥有一个网络,并在不同的数据集上对其进行训练。

So say you have two sets of data X0 and X1 that have the same shape, you would do所以说你有两组数据X0X1具有相同的形状,你会做

from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.python.keras.utils import losses_utils


# number of features
n_features = 2000

# fake data w/batch size 4
X0 = tf.random.normal([4, n_features])
X1 = tf.random.normal([4, n_features])

# siamese encoder model
encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(
    units=1024, activation="relu", input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation="relu"))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation="relu"))
encoder.add(layers.Dropout(0.1))

# send both sets of data through same model
enc0 = encoder(X0)
enc1 = encoder(X1)

# compare the two outputs
compared = tf.keras.losses.CosineSimilarity(
    reduction=losses_utils.ReductionV2.NONE)(enc0, enc1)
print(f"cosine similarity of output: {compared.numpy()}")
# cosine similarity of output: [-0.5785658, -0.6405066, -0.57274437, -0.6017716]

# now do optimization ...

There are numerous way to compare the output, cosine similarity being one of them, but I just included it for illustration and you may require some other metric.有很多方法可以比较 output,余弦相似度就是其中之一,但我只是将其包括在内以进行说明,您可能需要一些其他指标。

There is only one network which is just duplicated.只有一个网络只是重复的。 All weights are shared.所有的权重都是共享的。 so you are training one network, just run it twice at each step of learning.因此,您正在训练一个网络,只需在每个学习步骤中运行两次即可。 you should pick two sample from your dataset and label it to 1 if came from same class and 0 otherwise.如果来自相同的 class,您应该从您的数据集中选择两个样本和 label 为 1,否则为 0。

from tensorflow.keras import models
from tensorflow.keras import layers
import tensorflow.keras.backend as K

n_features = 2000

def cos_similarity(x):
    x1,x2 = x
    return K.sum(x1*x2)/(K.sqrt(K.sum(x1*x1))*K.sqrt(K.sum(x2*x2)))

inp1 = layers.Input(shape=(n_features))
inp2 = layers.Input(shape=(n_features))

encoder = models.Sequential(name='encoder')
encoder.add(layer=layers.Dense(
    units=1024, activation="relu", input_shape=[n_features]))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=512, activation="relu"))
encoder.add(layers.Dropout(0.1))
encoder.add(layer=layers.Dense(units=256, activation="relu"))
encoder.add(layers.Dropout(0.1))

out1 = encoder(inp1)
out2 = encoder(inp2)

similarity = layers.Lambda(cos_similarity)([out1,out2])

model = models.Model(inputs=[inp1,inp2],outputs=[similarity])

model.compile(optimizer='adam',loss='mse')

For testing, first of all you should compute HOG features which you said it has 2000 features.为了测试,首先你应该计算你说它有 2000 个特征的 HOG 特征。 Then run然后运行

model.predict(hog_feature)

and you have output feature.你有 output 功能。 By the way I recommend to do not use hog feature and siamese network.顺便说一句,我建议不要使用 hog 功能和连体网络。 Extract image feature just using this network.仅使用此网络提取图像特征。 change input shape and train with images.更改输入形状并使用图像进行训练。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM