[英]How to find similar words with FastText?
I am playing around with FastText
, https://pypi.python.org/pypi/fasttext ,which is quite similar to Word2Vec
.我正在玩
FastText
, https://pypi.python.org/pypi/fasttext ,它与Word2Vec
非常相似。 Since it seems to be a pretty new library with not to many built in functions yet, I was wondering how to extract morphological similar words.由于它似乎是一个相当新的库,内置函数还不多,我想知道如何提取形态相似的词。
For eg: model.similar_word("dog")
-> dogs.例如:
model.similar_word("dog")
-> 狗。 But there is no function built-in.但是没有内置function。
If I type model["dog"]
如果我输入
model["dog"]
I only get the vector, that might be used to compare cosine similarity.我只得到可能用于比较余弦相似度的向量。
model.cosine_similarity(model["dog"], model["dogs"]])
. model.cosine_similarity(model["dog"], model["dogs"]])
。
Do I have to make some sort of loop and do cosine_similarity
on all possible pairs in a text?我是否必须进行某种循环并对文本中所有可能的对进行
cosine_similarity
? That would take time...!!!那需要时间...!!!
使用 Gensim,使用 load.word2vec 模型加载 fastText 训练好的 .vec 文件,并使用 most_similiar() 方法找到相似的单词!
You can install pyfasttext library to extract the most similar or nearest words to a particualr word.您可以安装pyfasttext库来提取与特定单词最相似或最接近的单词。
from pyfasttext import FastText
model = FastText('model.bin')
model.nearest_neighbors('dog', k=2000)
Or you can get the latest development version of fasttext, you can install from thegithub repository :或者你可以得到fasttext的最新开发版本,你可以从github仓库安装:
import fasttext
model = fasttext.load_model('model.bin')
model.get_nearest_neighbors('dog', k=100)
You should use gensim to load the model.vec
and then get similar words:您应该使用 gensim 加载
model.vec
然后得到类似的词:
m = gensim.models.Word2Vec.load_word2vec_format('model.vec')
m.most_similar(...)
You can install and import gensim library and then use gensim library to extract most similar words from the model that you downloaded from FastText .您可以安装并导入gensim库,然后使用 gensim 库从您从FastText下载的模型中提取最相似的词。
Use this:用这个:
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec')
similar = model.most_similar(positive=['man'],topn=10)
And by topn parameter you get the top 10 most similar words.通过 topn 参数,您可以获得前 10 个最相似的词。
Fasttext has a method called get_nearest_neighbors. Fasttext 有一个名为 get_nearest_neighbors 的方法。 nearest neighbor queries .
最近邻查询。 One needs the model's.bin file to use this.
需要模型的 .bin 文件才能使用它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.