简体   繁体   English

如何使用 FastText 查找相似词?

[英]How to find similar words with FastText?

I am playing around with FastText , https://pypi.python.org/pypi/fasttext ,which is quite similar to Word2Vec .我正在玩FastTexthttps://pypi.python.org/pypi/fasttext ,它与Word2Vec非常相似。 Since it seems to be a pretty new library with not to many built in functions yet, I was wondering how to extract morphological similar words.由于它似乎是一个相当新的库,内置函数还不多,我想知道如何提取形态相似的词。

For eg: model.similar_word("dog") -> dogs.例如: model.similar_word("dog") -> 狗。 But there is no function built-in.但是没有内置function。

If I type model["dog"]如果我输入model["dog"]

I only get the vector, that might be used to compare cosine similarity.我只得到可能用于比较余弦相似度的向量。 model.cosine_similarity(model["dog"], model["dogs"]]) . model.cosine_similarity(model["dog"], model["dogs"]])

Do I have to make some sort of loop and do cosine_similarity on all possible pairs in a text?我是否必须进行某种循环并对文本中所有可能的对进行cosine_similarity That would take time...!!!那需要时间...!!!

使用 Gensim,使用 load.word2vec 模型加载 fastText 训练好的 .vec 文件,并使用 most_similiar() 方法找到相似的单词!

You can install pyfasttext library to extract the most similar or nearest words to a particualr word.您可以安装pyfasttext库来提取与特定单词最相似或最接近的单词。

from pyfasttext import FastText
model = FastText('model.bin')
model.nearest_neighbors('dog', k=2000)

Or you can get the latest development version of fasttext, you can install from thegithub repository :或者你可以得到fasttext的最新开发版本,你可以从github仓库安装:

import fasttext
model = fasttext.load_model('model.bin')
model.get_nearest_neighbors('dog', k=100)

You should use gensim to load the model.vec and then get similar words:您应该使用 gensim 加载model.vec然后得到类似的词:

m = gensim.models.Word2Vec.load_word2vec_format('model.vec')
m.most_similar(...)

You can install and import gensim library and then use gensim library to extract most similar words from the model that you downloaded from FastText .您可以安装并导入gensim库,然后使用 gensim 库从您从FastText下载的模型中提取最相似的词。

Use this:用这个:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec')
similar = model.most_similar(positive=['man'],topn=10)

And by topn parameter you get the top 10 most similar words.通过 topn 参数,您可以获得前 10 个最相似的词。

Use gensim,使用gensim,

from gensim.models import FastText

model = FastText.load(PATH_TO_MODEL)
model.wv.most_similar(positive=['dog'])

More info here更多信息在这里

Fasttext has a method called get_nearest_neighbors. Fasttext 有一个名为 get_nearest_neighbors 的方法。 nearest neighbor queries . 最近邻查询 One needs the model's.bin file to use this.需要模型的 .bin 文件才能使用它。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM