如何在 gensim 中加载预训练模型并用它训练 doc2vec？

Question

I am having a ready to go word2vec model that I already trained.我准备好了我已经训练过的 word2vec 模型。 I have serialized it as a CSV file:我已将其序列化为 CSV 文件：

word,  v0,     v1,     ..., vN
house, 0.1234, 0.4567, ..., 0.3461
car,   0.456,  0.677,  ..., 0.3461

What I'd like to know is how I can load that word vector model in gensim and use that to train a paragraph or doc2vec model.我想知道的是如何在gensim加载该词向量模型并使用它来训练段落或 doc2vec 模型。

This Doc2Vec tutorial says I can load a model in form of a " # C text format " but I have no idea what that actually means.这个Doc2Vec 教程说我可以以“ #C # C text format ”的形式加载模型，但我不知道这实际上意味着什么。 What is "C text format" in the first place but more important:什么是“C 文本格式”，但更重要的是：

How can I load my word2vec model and use it for doc2vec training?如何加载我的 word2vec 模型并将其用于 doc2vec 训练？

How do I build the vocabulary from my word2vec model?如何从我的 word2vec 模型构建词汇表？

Answer 1

Doc2Vec does not need word-vectors as an input: it will create any word-vectors that are needed during its own training. Doc2Vec 不需要词向量作为输入：它将创建在其自身训练期间需要的任何词向量。 (And some modes, like pure DBOW – dm=0, dbow_words=0 – don't use or train word-vectors at all.) （还有一些模式，比如纯 DBOW—— dm=0, dbow_words=0根本不使用或训练词向量。）

Seeding a Doc2Vec model with word-vectors might help or hurt;用词向量播种 Doc2Vec 模型可能会有所帮助或有害； there's not much theory or published results to offer guidance.没有太多理论或已发表的结果可以提供指导。 There's an experimental method on Word2Vec, intersect_word2vec_format() , that can merge word2vec-c-format vectors into a model with an existing vocabulary, but you'd need to review the source to really understand its assumptions: Word2Vec 有一种实验方法intersect_word2vec_format() ，可以将 word2vec-c-format 向量合并到具有现有词汇表的模型中，但您需要查看源代码才能真正理解其假设：

https://github.com/RaRe-Technologies/gensim/blob/51753b95415bbc344ea6af671818277464905ea2/gensim/models/word2vec.py#L1140 https://github.com/RaRe-Technologies/gensim/blob/51753b95415bbc344ea6af671818277464905ea2/gensim/models/word2vec.py#L1140

如何在 gensim 中加载预训练模型并用它训练 doc2vec？

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-07-29 02:38:08

如何在 gensim 中加载预训练模型并用它训练 doc2vec？

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-07-29 02:38:08

解决方案1
1 已采纳 2016-07-29 02:38:08