简体   繁体   English

如何计算矩阵的第一行与R中的每一行之间的余弦相似度?

[英]How can I calculate cosine similarity between first row of my matrix with each other rows in R?

this is my_matrix : 这是my_matrix:

ui 194635691 194153563 177382028 177382031 195129144 196972549 196258704   194907960 196950156 194139014 153444738 192982501 192891196
1 237      0.00      0.00      0.00      0.00      0.00      0.00         0      0.01         0         0         0         0         0
2 261      0.01      0.00      0.00      0.00      0.00      0.00         0      0.00         0         0         0         0         0
3 290      0.00      0.00      0.01      0.01      0.00      0.00         0      0.00         0         0         0         0         0
4 483      0.00      0.00      0.00      0.00      0.00      0.01         0      0.00         0         0         0         0         0
5 533      0.00      0.01      0.00      0.00      0.00      0.00         0      0.00         0         0         0         0         0
6 534      0.00      0.00      0.00      0.00      0.01      0.00         0      0.00         0         0         0         0         0

these are my codes are following: 这些是我的代码如下:

b=my_matrix[1,2:length(my_matrix)]

for (i in nrow(my_matrix)) {
 res[i]=cosine(b,my_matrix[i,2:length(my_matrix)])
}

I used "lsa" package and I want to get a cosine similarity matrix that calculate b vector with every other vectors from matrix a but my codes throw a error that says : 我使用“lsa”包,我想获得一个余弦相似度矩阵,用矩阵a中的每个其他向量计算b向量,但是我的代码会抛出一个错误,表示:

argument mismatch. Either one matrix or two vectors needed as input.

What Should I do to fix my problem? 我该怎么做才能解决我的问题? many thanks in advance 提前谢谢了

Package "isa", which is not available for R version 3.2.2, is not really necessary. 包“isa”,不适用于R版本3.2.2,不是必需的。 Just do it yourself, using the definition of cosine similarity: 只是自己动手,使用余弦相似度的定义

my_matrix <- as.matrix(my_matrix)  # Make sure that "my_matrix" is indeed a "matrix".
v <- as.vector(my_matrix[1,-1])
M <- my_matrix[-1,-1]
cosSim <- ( M %*% v ) / sqrt( sum(v*v) * rowSums(M*M) )

The first line is only necessary if my_matrix is not yet a matrix but a data.frame . 仅当my_matrix不是matrix而是data.frame才需要第一行。

A possible explanation for the original error message shown in the question: 问题中显示的原始错误消息的可能解释:

I guess the class of the object my_matrix that was used in the code presented in the question and caused the error message 我想问题中出现的代码中使用的对象my_matrix的类并导致错误消息

argument mismatch. 论证不匹配。 Either one matrix or two vectors needed as input. 需要一个矩阵或两个向量作为输入。

was data.frame , not a matrix . data.frame ,而不是matrix If so, the arguments b and my_matrix[i,2:length(my_matrix)] in the call of the cosine function are again data.frames, not a vector and a matrix as exspected. 如果是这样,则cosine函数调用中的参数bmy_matrix[i,2:length(my_matrix)]也是data.frames,而不是exspected的向量和矩阵。

As an aside: 作为旁白:

Even if my_matrix is coerced to a matrix the code in the question will throw an error massage, since length(my_matrix) is larger than the number of columns and hence my_matrix[i,2:length(my_matrix)] selects undefined columns. 即使my_matrix被强制转换为matrix ,问题中的代码也会引发错误按摩,因为length(my_matrix)大于列数,因此my_matrix[i,2:length(my_matrix)]选择未定义的列。 The i -th row of my_matrix without the first column is my_matrix[i,2:ncol(my_matrix)] or shorter my_matrix[i,-1] . 没有第一列的my_matrix的第i行是my_matrix[i,2:ncol(my_matrix)]或更短的my_matrix[i,-1]

you can try this: 你可以试试这个:

A <- my_matrix[, -1]
b <- A[1,]
res <- apply(A[-1, ], 1, cosine, y=b)

This code was executed without an error: 执行此代码时没有错误:

d <- read.table(skip=1, text="ui 194635691 194153563 177382028 177382031 195129144 196972549 196258704   194907960 196950156 194139014 153444738 192982501 192891196
1 237      0.00      0.00      0.00      0.00      0.00      0.00         0      0.01         0         0         0         0         0
2 261      0.01      0.00      0.00      0.00      0.00      0.00         0      0.00         0         0         0         0         0
3 290      0.00      0.00      0.01      0.01      0.00      0.00         0      0.00         0         0         0         0         0
4 483      0.00      0.00      0.00      0.00      0.00      0.01         0      0.00         0         0         0         0         0
5 533      0.00      0.01      0.00      0.00      0.00      0.00         0      0.00         0         0         0         0         0
6 534      0.00      0.00      0.00      0.00      0.01      0.00         0      0.00         0         0         0         0         0")

my_matrix <- as.matrix(d)[,-1]  # without rownumbers.

library(lsa)
A <- my_matrix[, -1]  
b <- A[1,]
res <- apply(A[-1, ], 1, cosine, y=b)

But the result is vector with all values 0 (ie the first row is orthognal to the others). 但结果是所有值为0的向量(即第一行与其他行正交)。 That depends on your data and is easily seen in this case. 这取决于您的数据,在这种情况下很容易看到。

The cosine function from the lsa package calculates the cosine measure between all column vectors of a matrix, therefore: lsa包中的余弦函数计算矩阵的所有列向量之间的余弦度量,因此:

cosine(t(my_matrix[,2:ncol(my_matrix)]))

will return a matrix in which the first column is the vector of cosine measures between the first data row of my_matrix (b in your example) and all other rows. 将返回一个矩阵,其中第一列是my_matrix的第一个数据行(在您的示例中为b)与所有其他行之间的余弦度量向量。

If just want the vector of cosine similarities for the first row: 如果只想要第一行的余弦相似度向量:

as.vector(cosine(t(my_matrix[,2:ncol(my_matrix)]))[,1])

The nth element of this vector is the cosine similarity between the first row and the nth row of the original matrix. 该向量的第n个元素是原始矩阵的第一行和第n行之间的余弦相似度。

Let v be your 1 × m vector and M your m × n matrix v为1×m向量, M为 m×n矩阵

for (i in 1:dim(M)[2]){
  sim_cos_v[i] <- (v%*%as.vector(M[,i])) / (norm(as.matrix(v), "f")*norm(as.matrix(M[,i]), "f"))
}
sim_cos_v

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算向量与R中数据帧的每一行之间的余弦相似度? - How to calculate cosine similarity between vector and each rows of data frame in R? 如何计算两个字符串向量之间的余弦相似度 - How can I calculate Cosine similarity between two strings vectors 通过 R 中的余弦相似度为每行检索矩阵中的前 k 个相似行 - Retrieving top k similar rows in a matrix for each row via cosine similarity in R 余弦相似度:函数无法计算矩阵 - Cosine Similarity: Funtion Can't Calculate The Matrix R 中的余弦相似度矩阵 - Cosine Similarity Matrix in R R 中两个数据帧的行之间的余弦相似度 - Cosine similarity between rows of two dataframes in R 如何使用 quanteda 计算两组单个文档之间的余弦相似度? - How can I calculate cosine similarity between two sets of individual documents, using quanteda? 在R中,有了矩阵列表,如何快速找到列表中每个矩阵的第一行和第二行之间的差异? - In R, with a list of matrices, how can I quickly find the difference between the first and second row in each matrix in the list? 使用 R 中的矩阵乘法计算行式余弦相似度 - Calculating the row wise cosine similarity using matrix multiplication in R 计算R中tm包的TermDocumentMatrix中两个文档之间的余弦相似度 - Calculate Cosine Similarity between two documents in TermDocumentMatrix of tm Package in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM