简体   繁体   中英

Doc2vec: How can I manually modify a trained vector in a Doc2Vec gensim model?

I would like to replace a specific Doc2Vec vector created by a Doc2vec model with another one, with different weights.

These are the weights of existing vector (just some of the 800 real weights):

array([ 1.72976881e-01,  2.44364753e-01, -9.90936995e-01, -1.03020036e+00,
       -1.41046381e+00,  1.00970473e-02, -1.84546992e-01,  3.77230316e-01,
        9.20825064e-01, -2.61079431e-01,  7.51454890e-01, -1.15353882e+00,
       -9.96422302e-03,  1.65010715e+00,  5.63869551e-02, -4.25169647e-01],
      dtype=float32)

I'd like to replace them with these ones:

array([ 1.54585496e-01,  2.22857013e-01, -8.88102770e-01, -9.27794874e-01,
       -1.27402091e+00, -5.38651831e-04, -1.63646400e-01,  3.38727772e-01,
        8.28402698e-01, -2.29774594e-01,  6.77914560e-01, -1.04013634e+00,
       -1.37407500e-02,  1.48667252e+00,  5.83136305e-02, -3.88587236e-01]
      dtype=float32)

I tried to add a new vector to my model with this code:

model = gensim.models.Word2Vec.load('mymodel.doc2vec')
model.docvecs.add(entities=["88763"], weights=[new_vector])

I'm not getting any error, still when I call back that "88763" vector I see that it hasn't been updated:

model.docvecs["88763"]

array([ 1.72976881e-01,  2.44364753e-01, -9.90936995e-01, -1.03020036e+00,
       -1.41046381e+00,  1.00970473e-02, -1.84546992e-01,  3.77230316e-01,
        9.20825064e-01, -2.61079431e-01,  7.51454890e-01, -1.15353882e+00,
       -9.96422302e-03,  1.65010715e+00,  5.63869551e-02, -4.25169647e-01],
      dtype=float32)

Could someone please help me in some way?

Thanks.

Don't load a Doc2Vec model with `Word2Vec'. So load it instead with:

model = gensim.models.Doc2Vec.load('mymodel.doc2vec')

Once loaded, you should be able to modify any existing entry via direct assignment to a bracket-accessed entry, eg:

model.docvecs['88763'] = new_vector

(You would chiefly use add() to add vectors for keys that aren't already there. But it might also work to replace existing vectors in a batch if you supply the non-default replace=True parameter in addition to the list-of-entities and list-of-vectors.)

Update: The above is supposed to work, but there's a pending bug at the moment (November 2019, gensim-3.8.1 ) where it won't.

In the meantime, to modify one specific existing vector, you can act on the raw vectors_docs property, and look up the index-position to change yourself. For example:

slot = model.docvecs.int_index('88763', 
                               model.docvecs.doctags,
                               model.docvecs.max_rawint)
model.docvecs.vectors_docs[slot] = new_vector

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM