简体   繁体   English

平均多个模型Word2vec Gensim

[英]Averaging Multiple Models Word2vec Gensim

I have trained few million words in Word2Vec of Gensim of Python. 我已经在Python的Word2VecGensim中训练了数百万个单词。 I want to update this trained model with new data. 我想用新数据更新这个训练有素的模型。 But from your previous posts and other sources around the web I came to know this is not possible. 但是从您以前的帖子和网络上的其他来源,我知道这是不可能的。 So I am trying to create multiple models and dump them. 因此,我尝试创建多个模型并将其转储。 Now I want to merge the models I am dumping. 现在,我要合并要转储的模型。 I want to use these dumped results. 我想使用这些转储的结果。 I got a previous post Merging pretrained models in Word2Vec? 我以前有过一篇文章, 在Word2Vec中合并预训练的模型吗? but I am not getting how to do it. 但是我没有怎么做。 I came to know there is a library named deepdist, I am trying to see some experiments around: 我知道有一个名为deepdist的库,我正在尝试查看一些实验:

model = word2vec.Word2Vec.load_word2vec_format('/tmp/vectors.bin', binary=True)
  1. Is there a possible solution? 有没有可能的解决方案?
  2. If any, one may kindly suggest how to do it? 如果有的话,可能会建议如何做?

I am using Python2.7 on Windows 7 Professional. 我在Windows 7 Professional上使用Python2.7。

The answer you pointed to does NOT suggest merging the models as a solution. 您指出的答案并不建议合并模型作为解决方案。 Actually, they are suggesting you to use the different models that you have separately. 实际上,他们建议您使用分别拥有的不同模型。 Cast a prediction with each of the models and then combine the answers. 对每个模型进行预测,然后组合答案。 There are several approaches to combine the output. 有几种方法可以合并输出。 In your case you mention you have several models, so you can ignore the part of that answer where they suggest to break your training data in 2 in order to actually have 3 models casting predictions. 在您的情况下,您提到您有多个模型,因此您可以忽略该答案的那部分,因为他们建议将您的训练数据分成2个,以便实际上有3个模型进行预测。 You can use a majority voting policy as long as you have more than 2 predictions. 只要您有两个以上的预测,就可以使用多数表决策略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM