Using Doc2vec, I would like to see the impact of each word in the generated matrices.
Is there a way to see the detail representation of a matrix ie the content of the matrix and mostly what is represented by each row and each column?
For example this way I can see the matrix representation but not the column and row description:
user_vector = model.infer_vector(doc_words=normalized_code, steps=500, alpha=0.025)
print ('user_vector',user_vector)
('user_vector', array([ 0.24641024, -0.34768087, 0.02094658, -0.06164126, 0.13432615,
-0.22375308, -0.16741623, -0.2827304 , 0.04730519, 0.19883735,
-0.27629316, 0.00847638, 0.03568176, -0.31764287, -0.38039216,
0.08650897, 0.3766149 , 0.09078006, -0.1676072 , -0.1324272 ],
dtype=float32))
As a "dense embedding", the individual dimensions of a Doc2Vec
(or Word2Vec
) vector don't have clearly-describable interpretations.
The vectors are just in relative positions that work well for the training task – and fortunately for us, those same relative positions can correlate fairly well to ours senses of word-similarity, and even "neighborhoods" or "directions" of common-meaning.
But interesting semantic concepts, as with the 'royal leader' or 'gender' concepts captured by the famous Word2Vec
vec['king'] - vec['man'] + vec['woman'] ~close-to~ vec['queen']
example, aren't aligned with exact dimensions/axes.
So, the "row" you're seeing is just all the dimensions for a single vector, and each "column" is a dimension that's co-equal with any other, and not generally label-able.
(If you were to synthesize a new, similar document with a few different words, it'd get a different doc-vector – but the shifts probably wouldn't be tightly limited to any few dimensions.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.