简体繁体中英

What is the meaning of “size” of word2vec vectors [gensim library]?

原文 2018-12-03 05:29:29 0 1 python/ gensim/ word2vec/ word-embedding

Assume that we have 1000 words (A1, A2,..., A1000) in a dictionary. As fa as I understand, in words embedding or word2vec method, it aims to represent each word in the dictionary by a vector where each element represents the similarity of that word with the remaining words in the dictionary. Is it correct to say there should be 999 dimensions in each vector, or the size of each word2vec vector should be 999?

But with Gensim Python, we can modify the value of "size" parameter for Word2vec, let's say size = 100 in this case. So what does "size=100" mean? If we extract the output vector of A1, denoted (x1,x2,...,x100), what do x1,x2,...,x100 represent in this case?

1 answers

It is not the case that "[word2vec] aims to represent each word in the dictionary by a vector where each element represents the similarity of that word with the remaining words in the dictionary".

Rather, given a certain target dimensionality, like say 100, the Word2Vec algorithm gradually trains word-vectors of 100-dimensions to be better and better at its training task, which is predicting nearby words.

This iterative process tends to force words that are related to be "near" each other, in rough proportion to their similarity - and even further the various "directions" in this 100-dimensional space often tend to match with human-perceivable semantic categories. So, the famous "wv(king) - wv(man) + wv(woman) ~= wv(queen)" example often works because "maleness/femaleness" and "royalty" are vaguely consistent regions/directions in the space.

The individual dimensions, alone, don't mean anything. The training process includes randomness, and over time just does "whatever works". The meaningful directions are not perfectly aligned with dimension axes, but angled through all the dimensions. (That is, you're not going to find that a v[77] is a gender-like dimension. Rather, if you took dozens of alternate male-like and female-like word pairs, and averaged all their differences, you might find some 100-dimensional vector-dimension that is suggestive of the gender direction.)

You can pick any 'size' you want, but 100-400 are common values when you have enough training data.

gensim word2vec accessing in/out vectors

Python: What is the “size” parameter in Gensim Word2vec model class

Layer size in gensim's word2vec

What is right way to sum up word2vec vectors generated by Gensim?

Gensim word2vec augment or merge pre-trained vectors

Python gensim create word2vec model from vectors (in ndarray)

How extract vocabulary vectors from gensim's word2vec?

Matching words and vectors in gensim Word2Vec model

Gensim framework: Saving and storing word2vec keyed vectors

Gensim's word2vec returning awkward vectors

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question gensim word2vec accessing in/out vectors Python: What is the “size” parameter in Gensim Word2vec model class Layer size in gensim's word2vec What is right way to sum up word2vec vectors generated by Gensim? Gensim word2vec augment or merge pre-trained vectors Python gensim create word2vec model from vectors (in ndarray) How extract vocabulary vectors from gensim's word2vec? Matching words and vectors in gensim Word2Vec model Gensim framework: Saving and storing word2vec keyed vectors Gensim's word2vec returning awkward vectors

Related Tags

What is the meaning of “size” of word2vec vectors [gensim library]?

Question

1 answers

solution1 2 ACCPTED 2018-12-03 20:47:17

solution1
2 ACCPTED 2018-12-03 20:47:17