简体   繁体   English

单词间相似的最佳WordNet功能是什么?

[英]What's best WordNet function for similarity between words?

I aim to find the similarities between words for about ~10,000 words. 我的目标是找到大约10,000个单词之间的单词相似性。 I'm using the "word.path_similarity(otherword)" method of the wordnet library but the results I'm getting for the path_similarity are in the range 0-0.1 as opposed to being distributed over 0-1. 我正在使用wordnet库的“ word.path_similarity(otherword)”方法,但获得的path_likeity结果在0-0.1范围内,而不是在0-1范围内分布。 How is it possible that similarities between 10,000 random words all end up in that narrow range? 10,000个随机词之间的相似性怎么都可能会落在这个狭窄的范围内?

Is there a better way to use WordNet for finding similarity between two words? 有没有更好的方法使用WordNet查找两个单词之间的相似性?

For context, here's how this is calculated : 对于上下文, 这是如何计算的

  1. Claculate the length of the shortest path between the two synsets/words (inclusive). 确定两个同义词集/单词(包括两个)之间的最短路径的长度。

  2. Return the score as 1/pathlen 将分数返回为1 / pathlen

Therefore a score <.2 is indicative of a pathlength > 5 steps. 因此,分数<.2表示路径长度> 5步。 Inclusive of the two input synsets, that means there are at least 4 synsets between them. 包括两个输入同义集,这意味着它们之间至少有4个同义集。

With that said: you're complaint seems to be "according to this metric, two words chosen at random are pretty consistently unrelated! What's going on?" 这么说:您所抱怨的似乎是“根据该指标,随机选择的两个词始终是不相关的!这是怎么回事?” Well, your similarity metric is telling you that random words are generally not closely related. 好吧,您的相似性指标告诉您随机词通常并不紧密相关。 This shouldn't be that surprising. 这并不奇怪。 Why are you calculating similarities between random words to begin with? 为什么要计算随机词之间的相似度?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 WordNet Python的单词相似度 - WordNet Python words similarity 哪个nltk.corpus.wordnet的相似性函数适合于找到两个单词的相似度? - which similarity function of nltk.corpus.wordnet is Appropriate for find similarity of two words? 如何计算WordNet中没有出现的英文单词的相似度? - How to calculate the similarity of English words that do not appear in WordNet? 使用wordnet获得句子中单词的最佳同义词 - Get best synonym for words in sentences using wordnet Nltk的wordnet词形分解器不能词形化所有词 - Nltk's wordnet lemmatizer not lemmatizing all words 使用WordNet确定两个文本之间的语义相似度? - Using WordNet to determine semantic similarity between two texts? 计算两对 X 和 y 之间的相似性的最佳做法是什么 - What is the best practice to calculate the similarity between two couples of X And y 对不同字符串参数之间的文档相似性建模的最佳方法是什么? - What is the best way to model document similarity between different string parameters? 在 python 中测量多种语言文本之间相似性的最佳方法是什么? - What is the best approach to measure a similarity between texts in multiple languages in python? 单词列表之间的字符串相似度 - String similarity between list of words
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM