简体   繁体   中英

Load gensim Word2Vec computed in Python 2, in Python 3

I have a gensim Word2Vec model computed in Python 2 like that:

from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence

model = Word2Vec(LineSentence('enwiki.txt'), size=100, 
                 window=5, min_count=5, workers=15)
model.save('w2v.model')

However, I need to use it in Python 3. If I try to load it,

import gensim
from gensim.models import Word2Vec
model = Word2Vec.load('w2v.model')

it results in an error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf9 in position 0: ordinal not in range(128)

I suppose the problem is in differences in encoding between Python2 and Python3. Also it seems like gensim is using pickle to save/load models.

Is there a way to set encoding/pickle options so that the model loads properly? Or maybe use some external tool to convert the model file?

Recomputing it in Python 3 is not an option: it takes way too much time.

This indeed looks like a bug somewhere, as noted by memoselyk, and can be fixed in a way described in a comment to this answer.

So you have to add encoding='latin1' to a call to _pickle.loads in gensim.utils.unpickle , load the model in Python 3, then save it, and now you can revert this fix and load this new model in unmodified gensim with Python 3.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM