简体   繁体   English

如何在张量流中对张量的列执行相似度函数

[英]how to perform similarity function over columns of a tensor in tensorflow

I have a tensor like this:我有一个这样的张量:

tf_a1 =      [[-0.65 0.   0.   0.   0.42  0.   0.   0.51 0.   0.34 0.]
              [0.   -0.51 0.   0.  -0.52  0.   0.   0.   0.53 0.42 0.]
              [0.    0.32 0.  -0.50 0.34  0.   0.   0.39 0.32 0.52 0.]
              [0.    0.23 0.37 0.   0.    0.37 0.37 0.   0.47 0.39 0.3 ]]

I want to apply cosine similarity over each column of this tensor.我想在这个张量的每一列上应用cosine similarity So, I want to find the similarity of the first column versus rest of the columns.所以,我想找到第一列与其余列的相似性。 Again, second column against rest of the columns and so on.再次,第二列与其余列的对比,依此类推。

I have done this using the for loop as such:我已经使用 for 循环完成了此操作:

def cosine_score(x):
    for i, arr in enumerate(x):
        if i == 0 :
            first = cosine_similarity(x[i,].reshape(1, -1), x)
        else:
            second = cosine_similarity(x[i,].reshape(1, -1), x)
            final = tf.concat((first, second), axis=0)
            first = final
    return final
sim_topics = cosine_score(tf_a1)

Now, When I want to include this in my model, I can not use foor loop as it is.现在,当我想将它包含在我的模型中时,我不能按原样使用 foo 循环。 seems I have to use tf.map_fn to go over it.似乎我必须使用tf.map_fn来检查它。

I also have done like this:我也这样做过:

def cosine_score(x):
    def cos_similarity(col):
        for i, arr in enumerate(col):
            if i == 0:
                first = cosine_similarity(col[i, ].reshape(1, -1), col)
            else:
                second = cosine_similarity(col[i, ].reshape(1, -1), col)
                final = tf.concat((first, second), axis=0)
                first = final
        return final
    sim = tf.map_fn(cos_similarity, x, dtype=tf.float32)
    return sim

But here I need to remove the for loop .但在这里我需要删除for loop My problem is that if I remove for loop and access each column seperately, how can I access the rest of the columns to compare and apply cosine similarity .我的问题是,如果我删除for loop并单独访问每一列,我如何访问其余的列以比较和应用cosine similarity

Please let me know if its not clear.如果不清楚,请告诉我。

Cosine similarity is nothing more than an L2 normalized dot product.余弦相似度无非是 L2 归一化点积。 So, in Tensorflow this should do the trick for you:因此,在Tensorflow这应该为您解决问题:

# Normalize the columns of the tensor
normalized_tensor = tf.math.l2_normalize(tf_a1, axis=0)
# Get the dot product between the columns
scores = tf.matmul(normalized_tensor, normalized_tensor, transpose_a=True)

The tensor scores contains the cosine similarity between the columns of tf_a1 .张量scores包含tf_a1列之间的余弦相似度。 Moreover, below is a Numpy equivalent implementation:此外,下面是一个Numpy等效实现:

# Normalize the columns of the tensor
normalized_tensor = tf_a1 / np.linalg.norm(tf_a1, axis=0)
# Get the dot product between the columns
scores = np.dot(normalized_tensor.T, normalized_tensor)

Finally, if you want to keep only one of the triangles (for example the upper triangle), and set the main diagonal to 0 , you can do the following in Tensorflow :最后,如果您只想保留一个三角形(例如上三角形),并将主对角线设置为0 ,您可以在Tensorflow执行以下Tensorflow

zero_diag = tf.linalg.set_diag(scores, tf.zeros(tf.shape(scores)[0]))
triangular = tf.matrix_band_part(zero_diag, 0, -1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM