计算R中两个单词的余弦相似度？

Question

I have a text file and would like to create semantic vectors for each word in the file. 我有一个文本文件，想为文件中的每个单词创建语义向量。 I would then like to extract the cosine similarity for about 500 pairs of words. 然后，我想提取约500对单词的余弦相似度。 What is the best package in R for doing this? R中最好的软件包是什么？

Answer 1

You can use lsa library. 您可以使用lsa库。 cosine function of the library gives a matrix of cosine similarity. 库的cosine函数给出了余弦相似度矩阵。 It takes a matrix as input. 它以矩阵作为输入。

Answer 2

If I understand your problem correctly, you want the cosine similarity of two vectors of words. 如果我正确理解了您的问题，则需要两个单词向量的余弦相似度。 Let us start with the cosine similiarity of two words only: 让我们从两个词的余弦相似性开始：

library(stringdist)
d <- stringdist("ca","abc",method="cosine")

The result is d= 0.1835034 as expected. 结果是d= 0.1835034如预期的那样。

There is also a function stringdistmatrix() contained in that package which calculates the distance between all pairs of strings: 该包中还包含一个函数stringdistmatrix() ，该函数计算所有字符串对之间的距离：

> d <- stringdistmatrix(c('foo','bar','boo','baz'))
> d
  1 2 3
2 3    
3 1 2  
4 3 1 2

For your purpose, you can simply use something like this 为了您的目的，您可以简单地使用类似这样的东西

stringdist(c("ca","abc"),c("aa","abc"),method="cosine")

The result are the measure for the distances between ca and aa on the one hand and abc compared with abc on the other hand: 结果是用于之间的距离度量ca和aa ，一方面和abc相比abc另一方面：

0.2928932 0.0000000

Disclaimer: The library stringdist is brand new (June 2019), but seems to work nicely. 免责声明：库stringdist是全新的（2019年6月），但似乎运行良好。 I am not associated with the authors of the library. 我与图书馆的作者无关。

计算R中两个单词的余弦相似度？

问题描述

2 个解决方案

解决方案1
1 2018-01-17 23:45:38

解决方案2
1 已采纳 2019-06-24 11:15:34

计算R中两个单词的余弦相似度？

问题描述

2 个解决方案

解决方案1 1 2018-01-17 23:45:38

解决方案2 1 已采纳 2019-06-24 11:15:34

解决方案1
1 2018-01-17 23:45:38

解决方案2
1 已采纳 2019-06-24 11:15:34