Find most similar words to randomy initialized array

Question

Using the Gensim package, I have trained a word2vec model on the corpus that I am working with as follows:

word2vec = Word2Vec(all_words, min_count = 3, size = 512, sg = 1)

Using Numpy, I have initialized a random array with the same dimensions:

vector = (rand(512)-0.5) *20

Now, I would like to find the words from the word2vec that are most similar to the random vector that I initialized.

For words in the word2vec, you can run:

word2vec.most_similar('word')

And the output is a list with most similar words and their according distance.

I would like to get a similar output for my initialized array.

However, when I run:

word2vec.most_similar(vector)

I get the following error:

Traceback (most recent call last):

  File "<ipython-input-297-3815cf183d05>", line 1, in <module>
    word2vec.most_similar(vector)

  File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\utils.py", line 1461, in new_func1
    return func(*args, **kwargs)

  File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\base_any2vec.py", line 1383, in most_similar
    return self.wv.most_similar(positive, negative, topn, restrict_vocab, indexer)

  File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 549, in most_similar
    for word, weight in positive + negative:

TypeError: cannot unpack non-iterable numpy.float64 object

What can I do to overcome this error and find the most similar words to my arrays?

I've checked this and this page. However, it is unclear to me how I could solve my problem with these suggestions.

Answer 1

You are trying to see if a floating point number is similar to a string, and that doesn't work ( cannot unpack non-iterable numpy.float64 object ).

What you need to do is to properly generate random strings , not random floating point numbers. Once this is done, your code will work. See also the documentation that states list of str ( https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.most_similar )

Answer 2

Gensim's KeyedVectors interface .most_similar() method can take raw vectors as its target, but in order for its current (at least through gensim-3.8.3 ) argument-type-detection to not mistake a single vector for a list-of-keys, you would need to provide it explicitly as one member of a list of items for the named positive parameter.

Specifically, this should work:

similars = word2vec.wv.most_similar(positive=[vector,])

Find most similar words to randomy initialized array

Question

2 answers

solution1
1 2020-08-21 14:18:43

solution2
1 ACCPTED 2020-08-21 16:03:18

Find most similar words to randomy initialized array

Question

2 answers

solution1 1 2020-08-21 14:18:43

solution2 1 ACCPTED 2020-08-21 16:03:18

solution1
1 2020-08-21 14:18:43

solution2
1 ACCPTED 2020-08-21 16:03:18