[英]gensim word2vec - array dimensions in updating with online word embedding
Word2Vec from gensim 0.13.4.1 to update the word vectors on the fly does not work. 来自gensim 0.13.4.1的Word2Vec在运行中更新单词向量不起作用。
model.build_vocab(sentences, update=False)
works fine; 工作正常; however, 然而,
model.build_vocab(sentences, update=True)
does not. 才不是。
I am using this website to try and emulate what they have done; 我正在使用这个网站试图模仿他们所做的事情; hence I use the following script at some point: 因此我在某些时候使用以下脚本:
model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("./text8/text8")
model.build_vocab(sentences, keep_raw_vocab=False, trim_rule=None, progress_per=10000, update=False)
model.train(sentences)
However while this runs with update=False
, using update=True
gives me the following traceback: 但是,当使用update=False
运行时,使用update=True
会给我以下回溯:
Traceback (most recent call last):
File "word2vecAttempt.py", line 34, in <module>
model.build_vocab(sentences, progress_per=10000, update=True)
File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 535, in build_vocab
self.finalize_vocab(update=update) # build tables & arrays
File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 708, in finalize_vocab
self.update_weights()
File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1070, in update_weights
self.wv.syn0 = vstack([self.wv.syn0, newsyn0])
File "/home/brownc/anaconda3/lib/python3.5/site-packages/numpy/core/shape_base.py", line 230, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
I was able to reproduce your error. 我能够重现你的错误。 I think you're calling update=True
when the model is not trained yet. 我认为当模型尚未训练时,你正在调用update=True
。 You should only call it when it has been pre-trained. 您应该只在预先训练后调用它。
This works: 这有效:
import gensim
model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=False)
model.train(sentences)
model.build_vocab(sentences, update=True)
model.train(sentences)
But this will fail: 但这会失败:
import gensim
model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=True)
model.train(sentences)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Using the latest version of gensim 0.13.4.1. 使用最新版本的gensim 0.13.4.1。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.