简体   繁体   English

如何根据共现矩阵计算相似度?

[英]How to compute similarities based on co-occurrence matrix?

I have an item-item matrix (1877 x 1877).我有一个项目-项目矩阵(1877 x 1877)。 The values in the matrix represent the number of times two items occurred together.矩阵中的值表示两个项目一起出现的次数。 How can I determine the similarities between two items?如何确定两个项目之间的相似性? Through reading, i found few options.通过阅读,我发现了几个选项。 However i am not sure about these approaches.但是我不确定这些方法。 Any inputs to get started is appreciated.任何开始的输入表示赞赏。

  1. Use cosine to compute sim between two vectors使用余弦计算两个向量之间的 sim
  2. Turn this into a graph, use measures like simrank to compute similarity - may use the occurrence count as a weight between two nodes.将其转换为图形,使用 simrank 等度量来计算相似度 - 可以使用出现次数作为两个节点之间的权重。

I would recommend using spatial cosine similarity .我建议使用空间余弦相似度 Alternatively you could calculate jaccard's similarity for each item pair.或者,您可以计算每个项目对的jaccard 相似度

After calculating either similarity matrix (affinity matrix) you can use a spectral (or spatial) clustering algorithm, such as sklearn's spectral clustering algorithm to group those items.在计算相似性矩阵(亲和矩阵)后,您可以使用光谱(或空间)聚类算法,例如sklearn 的光谱聚类算法来对这些项目进行分组。

You can thread it as 1877 items with 1877 features each.您可以将其作为 1877 个项目进行线程化,每个项目具有 1877 个特征。 If two items are similar, than they co-occurrences will be similar.如果两个项目相似,那么它们的共现将是相似的。 Given that you might use NearestNeighbors in order to find closest one.鉴于您可能会使用NearestNeighbors来找到最近的NearestNeighbors There are may metrics available.可能有可用的指标。

Also, reprocessing the data may help you.此外,重新处理数据可能对您有所帮助。 I do not know it's distribution but you might want to normalize values into range [0;1] or doing sth like that.我不知道它的分布,但您可能希望将值归一化到范围 [0;1] 或这样做。

If your co-nonoccurence matrix is symmetrical , you don't need to normalize it.如果您的共非矩阵对称的,则不需要对其进行归一化。 You can refer to this paper for gain more information about normalization of symmetrical and asymmetrical co-matrices: Leydesdorff, L. and Vaughan, L., 2006. Co‐occurrence matrices and their applications in information science: Extending ACA to the Web environment.您可以参考这篇论文以获取有关对称非对称共矩阵归一化的更多信息: Leydesdorff, L. 和 Vaughan, L., 2006。共现矩阵及其在信息科学中的应用:将 ACA 扩展到 Web 环境。 Journal of the American Society for Information Science and technology, 57(12), pp.1616-1628.美国信息科学与技术学会杂志,57(12),第 1616-1628 页。 please, click hear请点击收听

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM