简体   繁体   English

继续训练 FastText 模型

[英]Continue training a FastText model

I have downloaded a .bin FastText model, and I use it with gensim as follows:我已经下载了一个.bin FastText 模型,并将它与gensim一起使用,如下所示:

model = FastText.load_fasttext_format("cc.fr.300.bin")

I would like to continue the training of the model to adapt it to my domain.我想继续训练模型以使其适应我的领域。 After checking FastText's Github and the Gensim documentation it seems like it is not currently feasible appart from using this person's proposed modification (not yet merged).在检查了 FastText 的 GithubGensim 文档之后,使用此人提出的修改(尚未合并)似乎目前不可行

Am I missing something?我错过了什么吗?

You can continue training in some versions of Gensim's fastText (for example, v.3.7.*).您可以在 Gensim 的fastText的某些版本(例如 v.3.7.*)中继续训练。 Here is an example of " Loading, inferring, continuing training "下面是一个“ 加载、推断、继续训练”的例子

from gensim.test.utils import datapath
model = load_facebook_model(datapath("crime-and-punishment.bin"))
sent = [['lord', 'of', 'the', 'rings'], ['lord', 'of', 'the', 'semi-groups']]
model.build_vocab(sent, update=True)
model.train(sentences=sent, total_examples = len(sent), epochs=5)

For some reason, the gensim.models.fasttext.load_facebook_model() is missing on Windows, but exists on Mac's installation.出于某种原因, gensim.models.fasttext.load_facebook_model()在 Windows 上丢失,但在 Mac 的安装中存在。 Alternatively, one can use gensim.models.FastText.load_fasttext_format() to load a pre-trained model and continue training.或者,可以使用gensim.models.FastText.load_fasttext_format()加载预训练模型并继续训练。

Here are various pre-trained Wiki word models and vectors (or here ).以下是各种预训练的 Wiki 单词模型和向量(或此处)。

Another example . 另一个例子 " Note: As in the case of Word2Vec, you can continue to train your model while using Gensim's native implementation of fastText. " "注意:与 Word2Vec 的情况一样,您可以在使用 Gensim 的本机实现 fastText 的同时继续训练您的模型。 "

官方的 FastText 实现目前不支持这一点,尽管您可以在此处找到与此问题相关的公开票证。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM