Doc2vec矩阵表示

Question

Using Doc2vec, I would like to see the impact of each word in the generated matrices. 使用Doc2vec，我希望看到每个单词在生成的矩阵中的影响。

Is there a way to see the detail representation of a matrix ie the content of the matrix and mostly what is represented by each row and each column? 有没有办法查看矩阵的详细表示形式，即矩阵的内容以及每一行和每一列所代表的内容？

For example this way I can see the matrix representation but not the column and row description: 例如，通过这种方式，我可以看到矩阵表示形式，但是看不到列和行描述：

user_vector = model.infer_vector(doc_words=normalized_code, steps=500, alpha=0.025)
                print ('user_vector',user_vector)

('user_vector', array([ 0.24641024, -0.34768087,  0.02094658, -0.06164126,  0.13432615,
       -0.22375308, -0.16741623, -0.2827304 ,  0.04730519,  0.19883735,
       -0.27629316,  0.00847638,  0.03568176, -0.31764287, -0.38039216,
        0.08650897,  0.3766149 ,  0.09078006, -0.1676072 , -0.1324272 ],
      dtype=float32))

Answer 1

As a "dense embedding", the individual dimensions of a Doc2Vec (or Word2Vec ) vector don't have clearly-describable interpretations. 作为“密集嵌入”， Doc2Vec （或Word2Vec ）向量的各个维没有明确描述的解释。

The vectors are just in relative positions that work well for the training task – and fortunately for us, those same relative positions can correlate fairly well to ours senses of word-similarity, and even "neighborhoods" or "directions" of common-meaning. 向量只是在相对位置上，非常适合训练任务-幸运的是，对于我们来说，这些相同的相对位置可以很好地与我们的单词相似感相关联，甚至与常见的“邻居”或“方向”相关。

But interesting semantic concepts, as with the 'royal leader' or 'gender' concepts captured by the famous Word2Vec vec['king'] - vec['man'] + vec['woman'] ~close-to~ vec['queen'] example, aren't aligned with exact dimensions/axes. 但是有趣的语义概念，例如著名的Word2Vec vec['king'] - vec['man'] + vec['woman'] ~close-to~ vec['queen'] Word2Vec vec['king'] - vec['man'] + vec['woman'] ~close-to~ vec['queen'] '所捕获的“皇室首领”或“性别”概念vec['king'] - vec['man'] + vec['woman'] ~close-to~ vec['queen']示例，未与确切的尺寸/轴对齐。

So, the "row" you're seeing is just all the dimensions for a single vector, and each "column" is a dimension that's co-equal with any other, and not generally label-able. 因此，您看到的“行”仅是单个向量的所有维，而每个“列”都是与其他任何维均等的维，并且通常无法标注。

(If you were to synthesize a new, similar document with a few different words, it'd get a different doc-vector – but the shifts probably wouldn't be tightly limited to any few dimensions.) （如果要用几个不同的词来合成一个新的类似文档，它将得到一个不同的doc-vector-但是这种转变可能不会严格地局限于任何几个方面。）

Doc2vec矩阵表示

问题描述

1 个解决方案

解决方案1
0 2019-08-25 13:16:00

Doc2vec矩阵表示

问题描述

1 个解决方案

解决方案1 0 2019-08-25 13:16:00

解决方案1
0 2019-08-25 13:16:00