[英]How can I calculate cosine similarity between first row of my matrix with each other rows in R?
[英]Retrieving top k similar rows in a matrix for each row via cosine similarity in R
如何使用R通过余弦相似度有效地检索前K个相似向量? 询问如何计算一个矩阵的每个向量相对于另一个矩阵的顶部相似向量。 它的回答令人满意,我想调整它以在单个矩阵上运行。
也就是说,我想相关的前k用于在矩阵的每一行类似的其它行。 我怀疑解决方案非常相似,但可以优化。
此功能基于链接的答案:
CosineSimilarities <- function(m, top.k) {
# Computes cosine similarity between each row and all other rows in a matrix.
#
# Args:
# m: Matrix of values.
# top.k: Number of top rows to show for each row.
#
# Returns:
# Data frame with columns for pair of rows, and cosine similarity, for top
# `top.k` rows per row.
#
# Similarity computation
cp <- tcrossprod(m)
mm <- rowSums(m ^ 2)
result <- cp / sqrt(outer(mm, mm))
# Top similar rows from train (per row)
# Use `top.k + 1` to remove the self-reference (similarity = 1)
top <- apply(result, 2, order, decreasing=TRUE)[seq(top.k + 1), ]
result.df <- data.frame(row.id1=c(col(top)), row.id2=c(top))
result.df$cosine.similarity <- result[as.matrix(result.df[, 2:1])]
# Remove same-row records and return
return(result.df[result.df$row.id1 != result.df$row.id2, ])
}
例如:
(m <- matrix(1:9, nrow=3))
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
CosineSimilarities(m, 1)
# row.id1 row.id2 cosine.similarity
# 2 1 2 0.9956
# 4 2 3 0.9977
# 6 3 2 0.9977
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.