简体   繁体   中英

Unable to load gensim Fasttext model - UTF-8 unicode error

I have trained FastText model for french language using Gensim library. Suddenly, this trained model is not getting loaded into memory.

I am using below code :-

from gensim.models import FastText
fname = "filename"
model = FastText.load(fname)

and it throws following error : -

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/gensim/models/fasttext.py", line 1070, in load
    model = super(FastText, cls).load(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gensim/models/base_any2vec.py", line 1244, in load
    model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gensim/models/base_any2vec.py", line 603, in load
    return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/gensim/utils.py", line 426, in load
    obj = unpickle(fname)
  File "/usr/local/lib/python3.7/site-packages/gensim/utils.py", line 1384, in unpickle
    return _pickle.load(f, encoding='latin1')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 14072054: invalid start byte

As this model is trained on large dataset, is there any way to recover/load this model?

This error means that the text stored in your model does not adhere to the utf-8 encoding, as explained here .

A solution with an already trained model would be to set unicode_errors flag when running the model:

from gensim.models import FastText
fname = "filename"
model = FastText.load(fname, unicode_errors='ignore')

This will, however, result in ignoring the words/characters in question, which may not be ideal.

Better would be to re-train the model using utf-8 compliant setup, but it would require re-training.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM