简体繁体中英

How doc2vec creates vector for sentence

原文 2018-10-31 03:47:39 3 2 python/ machine-learning/ data-science/ word2vec/ doc2vec

I am working on Doc2vec for text classification. It is creating a vector for a sentence with some given size (eg: 100, length of vector). I am not able to understand how it creates vector of that length.

i am following this link . in here they are creating a vector for sentence which will be saved in the doc2v model, i can't use this model for new data(production data) to test as there is no vector for new sentence. Error showing for new data

KeyError: "tag 'Test_2028' not seen in training corpus/invalid"

2 answers

Doc2Vec concept :

The goal of doc2vec is to create a numeric representation of a document, regardless of its length. But unlike words, documents do not come in logical structures such as words, so the another method has to be found.

The concept that Mikolov and Le have used was simple, yet clever: they have used the word2vec model, and added another vector, paragraph_ID , which is document-unique. Now, instead of using just words to predict the next word, we also added another feature vector.

So, when training the word vectors W , the document vector paragraph_ID is trained as well, and in the end of training, it holds a numeric representation of the document.

You can read more about it here

If you've created a gensim Doc2Vec model with your training data, it will only know trained vectors for the document tags that were present in the training data.

However, there's also the method infer_vector() which can infer a compatible document-vector for a new text. The new text should be tokenized the same as the training data, and passed as a list-of-string-tokens to infer_vector() .

Building Vector for a sentence in doc2vec from an untrained data set

Doc2Vec Sentence Clustering

Doc2vec: How can I manually modify a trained vector in a Doc2Vec gensim model?

Gensim doc2vec sentence tagging

Doc2Vec: Differentiate Sentence and Document

Doc2Vec find the similar sentence

How to measure the word weight using doc2vec vector

How to get the Document Vector from Doc2Vec in gensim 0.11.1?

Getting tags for a vector in the model Doc2Vec

removing randomization of vector initialization for doc2vec

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Building Vector for a sentence in doc2vec from an untrained data set Doc2Vec Sentence Clustering Doc2vec: How can I manually modify a trained vector in a Doc2Vec gensim model? Gensim doc2vec sentence tagging Doc2Vec: Differentiate Sentence and Document Doc2Vec find the similar sentence How to measure the word weight using doc2vec vector How to get the Document Vector from Doc2Vec in gensim 0.11.1? Getting tags for a vector in the model Doc2Vec removing randomization of vector initialization for doc2vec

Related Tags

How doc2vec creates vector for sentence

Question

2 answers

solution1
0 2018-10-31 08:05:07

solution2
0 ACCPTED 2018-11-01 08:09:32

How doc2vec creates vector for sentence

Question

2 answers

solution1 0 2018-10-31 08:05:07

solution2 0 ACCPTED 2018-11-01 08:09:32

solution1
0 2018-10-31 08:05:07

solution2
0 ACCPTED 2018-11-01 08:09:32