简体   繁体   English

如何从gensim的word2vec中提取词汇向量?

[英]How extract vocabulary vectors from gensim's word2vec?

I want to analyze the vectors looking for patterns and stuff, and use SVM on them to complete a classification task between class A and B, the task should be supervised. 我想分析向量以查找样式和内容,并在它们上使用SVM完成A类和B类之间的分类任务,该任务应受到监督。 (I know it may sound odd but it's our homework.) so as a result I really need to know: (我知道这听起来可能很奇怪,但这是我们的作业。)因此,我真的需要知道:

1- how to extract the coded vectors of a document using a trained model? 1-如何使用经过训练的模型提取文档的编码矢量?

2- how to interpret them and how does word2vec code them? 2-如何解释它们以及word2vec如何编码它们?

I'm using gensim's word2vec. 我正在使用gensim的word2vec。

  1. If you have trained word2vec model, you can get word-vector by __getitem__ method 如果您已经训练过word2vec模型,则可以通过__getitem__方法获得单词向量

    model = gensim.models.Word2Vec(sentences) print(model["some_word_from_dictionary"])

  2. Unfortunately, embeddings from word2vec/doc2vec not interpreted by a person (in contrast to topic vectors from LdaModel) 不幸的是,word2vec / doc2vec中的嵌入没有被人解释(与LdaModel的主题向量相反)

P/S If you have texts at the object in your tasks, then you should use Doc2Vec model P / S如果您在任务中的对象处有文本,则应使用Doc2Vec模型

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM