Doc2Vec model Python 3 compatibility

Question

I trained a doc2vec model with Python2 and I would like to use it in Python3.

When I try to load it in Python 3, I get :

Doc2Vec.load('my_doc2vec.pkl')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal not in range(128)

It seems to be related to a pickle compatibility issue, which I tried to solve by doing :

with open('my_doc2vec.pkl', 'rb') as inf:
    data = pickle.load(inf)
data.save('my_doc2vec_python3.pkl')

Gensim saved other files which I renamed as well so they can be found when calling

de = Doc2Vec.load('my_doc2vec_python3.pkl')

The load() does not fail with UnicodeDecodeError but after the inference provides meaningless results.

I can't easily re-train it using Gensim in Python 3 as I used this model to create derived data from it, so I would have to re-run a long and complex pipeline.

How can I make the doc2vec model compatible with Python 3?

Answer 1

Answering my own question, this answer worked for me.

Here are the steps a bit more details :

download gensim source code, eg clone from repo
in gensim/utils.py, edit the method unpickle to add the encoding parameter:
```
  return _pickle.loads(f.read(), encoding='latin1') 
```
using Python 3 and the modified gensim, load the model:
```
 de = Doc2Vec.load('my_doc2vec.pkl') 
```
save it:
```
 de.save('my_doc2vec_python3.pkl') 
```

This model should be now loadable in Python 3 with unmodified gensim.

Doc2Vec model Python 3 compatibility

Question

1 answers

solution1
2 2016-07-20 16:33:27

Doc2Vec model Python 3 compatibility

Question

1 answers

solution1 2 2016-07-20 16:33:27

solution1
2 2016-07-20 16:33:27