简体   繁体   English

如何将Word2Vec的一种模型的词汇转换为另一种模型?

[英]How to use Word2Vec's vocab of one model into another?

I have a Doc2Vec's model and I want to create Word2vec's model with different dimension. 我有一个Doc2Vec's模型,我想创建具有不同尺寸的Word2vec's模型。 How can I use Doc2Vec's model vocab for fast training? 如何使用Doc2Vec的模型vocab进行快速培训? Or is it feasible to train like this? 还是这样训练是feasible Does vocab building has any effect on train ? vocab buildingtrain有影响吗?

vocab building is essentially just one pass over the entire dataset and doesn't impact the training time much (unless you are training over billions of words). vocab building本质上只是整个数据集的一遍,不会对培训时间产生太大影响(除非您要培训数十亿个单词)。

Gensim's Doc2Vec (to the best of my knowledge) doesn't currently allow creating models from pre-defined vocabulary. Gensim的Doc2Vec(据我所知)目前不允许使用预定义的词汇表创建模型。 If you are using Mikolov's code for sentence2vec ( https://groups.google.com/d/msg/word2vec-toolkit/Q49FIrNOQRo/J6KG8mUj45sJ ), it will allow you to save vocab and read from vocab. 如果您对句子2vec( https://groups.google.com/d/msg/word2vec-toolkit/Q49FIrNOQRo/J6KG8mUj45sJ )使用Mikolov的代码,则可以保存vocab并从vocab中读取。

word2vec -save-vocab <file>
word2vec -read-vocab <file>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM