简体   繁体   中英

Cannot load model with gensim FastText

I`ve faced with trouble to load model using gensim.model.FastText.load().

Here is some code and error which I get:

from gensim.models import FastText

class FastTextModel:
    def __init__(self, model_path, dim=300):
        self.dim = dim
        self.model = FastText.load(model_path).wv

...

class GeneralModel:
    def __init__(self, config):
        if config["type"] == "fasttext":
            # path - path to model
            # dim -  dimension, here 300
            self.model = FastTextModel(config["path"], config["dim"])
  File "/project/preprocessing/pipeline.py", line 15, in __init__
    self.model_ru = GeneralModel(config["models"]["ru"])
  File "/project/models/nlp_models.py", line 101, in __init__
    self.model = FastTextModel(config["path"], config["dim"])
  File "/project/models/nlp_models.py", line 16, in __init__
    self.model = FastText.load(model_path).wv
  File "/usr/local/lib64/python3.6/site-packages/gensim/models/fasttext.py", line 936, in load
    model = super(FastText, cls).load(*args, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/gensim/models/base_any2vec.py", line 1244, in load
    model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/gensim/models/base_any2vec.py", line 603, in load
    return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/gensim/utils.py", line 423, in load
    obj._load_specials(fname, mmap, compress, subname)
  File "/usr/local/lib64/python3.6/site-packages/gensim/utils.py", line 453, in _load_specials
    getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
  File "/usr/local/lib64/python3.6/site-packages/gensim/utils.py", line 464, in _load_specials
    val = np.load(subname(fname, attrib), mmap_mode=mmap)
  File "/usr/local/lib64/python3.6/site-packages/numpy/lib/npyio.py", line 447, in load
    pickle_kwargs=pickle_kwargs)
  File "/usr/local/lib64/python3.6/site-packages/numpy/lib/format.py", line 738, in read_array
    array.shape = shape
ValueError: cannot reshape array of size 67239904 into shape (445446,300)

I've downloaded models from Google Drive folder, and though that it can somehow damage .npy files (as they are quite big), so I've downloaded each file (there 7 files for that model) separately, but this didn`t help me.

Also, I read that sometimes it can be caused because of bad unzipping in the 'load' method, but I'm passing already unzipped files into it, so this also don`t work for me.

Will be grateful for the help!

Where did the model(s) originate? The gensim FastText.load() method is only for FastText models created & saved from gensim (via its .save() method). Such models use a combination of Python-pickling & sibling .npy raw-array files (to store large arrays) which must be kept together.

Models saved from Facebook's original FastText implementation are a different format, for which you'd use the load_facebook_model() utility function:

https://radimrehurek.com/gensim/models/fasttext.html#gensim.models.fasttext.load_facebook_model

If you only need the vectors – as seems to be the case from your immediate use of only the .wv property – you can also use the load_facebook_vectors() function:

https://radimrehurek.com/gensim/models/fasttext.html#gensim.models.fasttext.load_facebook_vectors

(Also, not sure why you've wrapped the loaded model in your own FastTextModel class which allows the caller to specify a dimensionality. You can't change the dimensionality of a loaded model, so it'd make more sense to just read the existing vector_size from the model, rather than specify it outside.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM