简体   繁体   English

如何在Spacy中找到令牌相似性?

[英]How can I find token similarity in Spacy?

I am trying to calculate token similarity in spacy. 我正在尝试计算spacy中的令牌相似度。 Ie how close word tokens are to one another. 即单词标记彼此之间有多接近。 I am using spacy version 2.0.5. 我正在使用spacy版本2.0.5。 Here is my trivial example. 这是我琐碎的例子。

import spacy
from spacy.lang.en import English
from spacy.tokenizer import Tokenizer

nlp = spacy.load('en') 

x = nlp(u'apple')
y = nlp(u'apple')

x.similarity(y)

This returns -81216639937292144.0 but I had expected it to be 1.0. 这将返回-81216639937292144.0,但我原以为1.0。

In addition 此外

x = nlp(u'apple')
y = nlp(u'apples')
x.similarity(y)

returns 0.0038385278814858344 which seems wrong as well. 返回0.0038385278814858344,这似乎也错误。

How should I be doing this token similarity so that it works? 我应该如何进行这种令牌相似性以使其起作用? I am really trying to stay within Spacy (rather than using a different string distance package) but would also welcome suggestions if this just can't be done in spacy. 我真的是想留在Spacy(而不是使用其他字符串距离包)中,但是如果这不能很好地完成,我也欢迎提出建议。

I tried doing same using spacy version 0.100.7. 我尝试使用spacy版本0.100.7进行相同的操作。 It works okay for me 对我来说还可以

import spacy
from spacy.en import English
from spacy.tokenizer import Tokenizer

nlp = spacy.load('en') 

x = nlp(u'apple')
y = nlp(u'apple')

print (x.similarity(y)) # prints 0.999999947205

x = nlp(u'apple')
y = nlp(u'apple')

print (x.similarity(sy)) # prints 0.6678450944

Can you please check your version of spacy. 您能检查一下您的spacy版本吗? Also, have you installed only deafult-en model? 另外,您是否仅安装了默认型号?

I too faced the same problem with version 2.0.5, you can roll back to version 2.0.2 where you will get a score like 1.0000000593284066 for 'apples' comparison to 'apples'. 我也遇到了与2.0.5版相同的问题,您可以回滚至2.0.2版,在此情况下,“苹果”与“苹果”的得分为1.0000000593284066。

For this first you have to uninstall all the libraries related to Spacy version 2.0.5, 首先,您必须卸载与Spacy 2.0.5版相关的所有库,

for dep in $(pip show spacy | grep Requires | sed 's/Requires: //g; s/,//g') ; do pip uninstall -y $dep ; done

Then install version 2.0.2, 然后安装2.0.2版,

pip install spacy=='2.0.2'

Next validate, 接下来验证

python -m spacy validate

It might ask you to install some library, like ftfy or some other and when you try to install, it will be already installed. 它可能会要求您安装一些库,例如ftfy或其他一些库,当您尝试安装时,该库已被安装。 For those uninstall them first and then reinstall them again(this might happen 3-4 times for different libraries). 对于那些,请先卸载它们,然后再重新安装(对于不同的库,可能会发生3-4次)。 We have to do this because lot of libraries get updated to latest version while installing spacy 2.0.5. 我们必须这样做,因为在安装spacy 2.0.5时,许多库都已更新到最新版本。 And lastly download the model, 最后下载模型,

python -m spacy download en_core_web_sm

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM