简体   繁体   中英

Continue training a FastText model

I have downloaded a .bin FastText model, and I use it with gensim as follows:

model = FastText.load_fasttext_format("cc.fr.300.bin")

I would like to continue the training of the model to adapt it to my domain. After checking FastText's Github and the Gensim documentation it seems like it is not currently feasible appart from using this person's proposed modification (not yet merged).

Am I missing something?

You can continue training in some versions of Gensim's fastText (for example, v.3.7.*). Here is an example of " Loading, inferring, continuing training "

from gensim.test.utils import datapath
model = load_facebook_model(datapath("crime-and-punishment.bin"))
sent = [['lord', 'of', 'the', 'rings'], ['lord', 'of', 'the', 'semi-groups']]
model.build_vocab(sent, update=True)
model.train(sentences=sent, total_examples = len(sent), epochs=5)

For some reason, the gensim.models.fasttext.load_facebook_model() is missing on Windows, but exists on Mac's installation. Alternatively, one can use gensim.models.FastText.load_fasttext_format() to load a pre-trained model and continue training.

Here are various pre-trained Wiki word models and vectors (or here ).

Another example . " Note: As in the case of Word2Vec, you can continue to train your model while using Gensim's native implementation of fastText. "

官方的 FastText 实现目前不支持这一点,尽管您可以在此处找到与此问题相关的公开票证。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM