Gensim：绘制Word2Vec模型中的单词列表

Question

I have a model trained with Word2Vec. 我有一个使用Word2Vec训练的模型。 It works well. 它运作良好。 I would like to plot only a list of words which I have entered in a list. 我只想绘制我在列表中输入的单词列表。 I have written the function below (and reused some code found) and get the following error message when a vector is added to arr : 'ValueError: all the input arrays must have same number of dimensions' 我已经在下面编写了该函数（并重用了一些找到的代码），并在向arr添加矢量时得到以下错误消息： 'ValueError：所有输入数组必须具有相同数量的维数'

def display_wordlist(model, wordlist):
    vector_dim = model.vector_size
    arr = np.empty((0,vector_dim), dtype='f') #dimension trained by the model
    word_labels = [word]

    # get words from word list and append vector to 'arr'
    for wrd in wordlist:
        word_array = model[wrd]
        arr = np.append(arr,np.array(word_array), axis=0) #This goes wrong

    # Use tsne to reduce to 2 dimensions
    tsne = TSNE(perplexity=65,n_components=2, random_state=0)
    np.set_printoptions(suppress=True)
    Y = tsne.fit_transform(arr)

    x_coords = Y[:, 0]
    y_coords = Y[:, 1]
    # display plot
    plt.figure(figsize=(16, 8)) 
    plt.plot(x_coords, y_coords, 'ro')

    for label, x, y in zip(word_labels, x_coords, y_coords):
        plt.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points')
    plt.xlim(x_coords.min()+0.00005, x_coords.max()+0.00005)
    plt.ylim(y_coords.min()+0.00005, y_coords.max()+0.00005)
    plt.show()

Answer 1

arr has a shape of (0, vector_dim) and word_array has a shape of (vector_dim,) . arr的形状为(0, vector_dim)而word_array的形状为(vector_dim,) 。 That's why you are getting that error. 这就是为什么您会收到该错误。

Simply reshaping word_array does the trick: 只需重塑word_array就可以了：

word_array = model[wrd].reshape(1, -1)

Sidenote 边注

Why are you passing the word list instead of "querying" the model for it? 为什么要传递单词列表，而不是为此“查询”模型？

wordlist = list(model.wv.vocab)

Answer 2

Thanks. 谢谢。 I have now modified my code and it delivers the correct result: 我现在修改了我的代码，它提供了正确的结果：

def display_wordlist(model, wordlist):
    vectors = [model[word] for word in wordlist if word in model.wv.vocab.keys()]
    word_labels = [word for word in wordlist if word in model.wv.vocab.keys()]
    word_vec_zip = zip(word_labels, vectors)

    # Convert to a dict and then to a DataFrame
    word_vec_dict = dict(word_vec_zip)
    df = pd.DataFrame.from_dict(word_vec_dict, orient='index')

    # Use tsne to reduce to 2 dimensions
    tsne = TSNE(perplexity=65,n_components=2, random_state=0)
    np.set_printoptions(suppress=True)
    Y = tsne.fit_transform(df)

    x_coords = Y[:, 0]
    y_coords = Y[:, 1]
    # display plot
    plt.figure(figsize=(16, 8)) 
    plt.plot(x_coords, y_coords, 'ro')

    for label, x, y in zip(df.index, x_coords, y_coords):
        plt.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points')
    plt.xlim(x_coords.min()+0.00005, x_coords.max()+0.00005)
    plt.ylim(y_coords.min()+0.00005, y_coords.max()+0.00005)
    plt.show()

Gensim：绘制Word2Vec模型中的单词列表

问题描述

2 个解决方案

解决方案1
1 2019-05-10 17:54:24

Sidenote 边注

解决方案2
0 2019-05-13 15:39:16

Gensim：绘制Word2Vec模型中的单词列表

问题描述

2 个解决方案

解决方案1 1 2019-05-10 17:54:24

Sidenote 边注

解决方案2 0 2019-05-13 15:39:16

解决方案1
1 2019-05-10 17:54:24

解决方案2
0 2019-05-13 15:39:16