简体   繁体   English

如何更改代码以查找word2vec插入中单词之间的欧几里得距离(而不是余弦)?

[英]How to change the code to find the euclidean distance (not cosine) between words in a word2vec impementation?

The following code when run gives the cosine distance between two words. 下面的代码在运行时给出两个单词之间的余弦距离。

model.wv.distance('word1','word2') model.wv.distance( 'WORD1', 'WORD2')

How do I find the euclidean distance between two words? 如何找到两个词之间的欧几里得距离? I am using gensim for word2vec implementation 我正在使用gensim进行word2vec实现

Usually the cosine-distance is preferred in this domain. 通常,在此域中余弦距离是首选。

But if you needed euclidean distance, you can just request the raw vectors for each word, find the difference, and use a basic `numpy.linalg.norm' operation, as per this StackOverflow answer: 但是,如果您需要欧几里得距离,则可以按照每个StackOverflow答案为每个单词请求原始向量,找出差异,然后使用基本的“ numpy.linalg.norm”操作:

How can the Euclidean distance be calculated with NumPy? 如何使用NumPy计算欧几里得距离?

Specifically: 特别:

import numpy as np
euc_dist = np.linalg.norm(model.wv['word1'] - model.wv['word2']))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM