I trained a doc2vec model with Python2 and I would like to use it in Python3.
When I try to load it in Python 3, I get :
Doc2Vec.load('my_doc2vec.pkl')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal not in range(128)
It seems to be related to a pickle compatibility issue, which I tried to solve by doing :
with open('my_doc2vec.pkl', 'rb') as inf:
data = pickle.load(inf)
data.save('my_doc2vec_python3.pkl')
Gensim saved other files which I renamed as well so they can be found when calling
de = Doc2Vec.load('my_doc2vec_python3.pkl')
The load() does not fail with UnicodeDecodeError but after the inference provides meaningless results.
I can't easily re-train it using Gensim in Python 3 as I used this model to create derived data from it, so I would have to re-run a long and complex pipeline.
How can I make the doc2vec model compatible with Python 3?
Answering my own question, this answer worked for me.
Here are the steps a bit more details :
in gensim/utils.py, edit the method unpickle to add the encoding parameter:
return _pickle.loads(f.read(), encoding='latin1')
using Python 3 and the modified gensim, load the model:
de = Doc2Vec.load('my_doc2vec.pkl')
save it:
de.save('my_doc2vec_python3.pkl')
This model should be now loadable in Python 3 with unmodified gensim.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.