简体   繁体   中英

DeprecationWarning in Gensim `most_similar`?

While implementating Word2Vec in Python 3.7, I am facing an unexpected scenario related to depreciation. My question is what exactly is the depreciation warning with respect to 'most_similar' in word2vec gensim python?

Currently, I am getting the following issue.

DeprecationWarning: Call to deprecated most_similar (Method will be removed in 4.0.0, use self.wv.most_similar() instead). model.most_similar('hamlet') FutureWarning: Conversion of the second argument of issubdtype from int to np.signedinteger is deprecated. In future, it will be treated as np.int32 == np.dtype(int).type . if np.issubdtype(vec.dtype, np.int):

Please help to curb this issue? Any help is appreciated.

The code what, I have tried is as follows.

import re
from gensim.models import Word2Vec
from nltk.corpus import gutenberg

sentences = list(gutenberg.sents('shakespeare-hamlet.txt'))   
print('Type of corpus: ', type(sentences))
print('Length of corpus: ', len(sentences))

for i in range(len(sentences)):
    sentences[i] = [word.lower() for word in sentences[i] if re.match('^[a-zA-Z]+', word)]
print(sentences[0])    # title, author, and year
print(sentences[1])
print(sentences[10])
model = Word2Vec(sentences=sentences, size = 100, sg = 1, window = 3, min_count = 1, iter = 10, workers = 4)
model.init_sims(replace = True)
model.save('word2vec_model')
model = Word2Vec.load('word2vec_model')
model.most_similar('hamlet')

It's a warning which that it's about to become obsolete and non-functional.

Usually things are deprecated for a few versions giving anyone using them enough time to move to the new method before they are removed.

They've moved most_similar towv

So most_simliar() should look something like:

model.wv.most_similar('hamlet')

src ref

Hope this helps

Edit : using wv.most_similar()

import re
from gensim.models import Word2Vec
from nltk.corpus import gutenberg

sentences = list(gutenberg.sents('shakespeare-hamlet.txt'))   
print('Type of corpus: ', type(sentences))
print('Length of corpus: ', len(sentences))

for i in range(len(sentences)):
    sentences[i] = [word.lower() for word in sentences[i] if re.match('^[a-zA-Z]+', word)]
print(sentences[0])    # title, author, and year
print(sentences[1])
print(sentences[10])
model = Word2Vec(sentences=sentences, size = 100, sg = 1, window = 3, min_count = 1, iter = 10, workers = 4)
model.init_sims(replace = True)
model.save('word2vec_model')
model = Word2Vec.load('word2vec_model')
similarities = model.wv.most_similar('hamlet')
for word , score in similarities:
    print(word , score)

A deprecation warning is a warning to indicate the use of things that may or may not exist in future versions of Python, often replaced by other things. (tells what they are)

It appears that the errors originate inside of Word2Vec, and not your code. Removing these errors would entail going into that library and changing its code.

Try doing what it tells you to do.

Change your model.most_similar('hamlet') to model.wv.most_similar('hamlet')

I am unfamiliar with this package, so adjust to how it would work for your use.

So, Gensim here is telling you that eventually you will not be able to use the most_similar method directly on the Word2Vec model. Instead, you will need to call it on the model.wv object, which are the keyed vectors that are stored when you train a model.

After the update to 4.0.0 version, the function model.most_similar() will be removed. So what you can do is to modify the function to model.wv.most_similar(). The same goes for the function model.similarity(). You have to change it to model.wv.similarity().

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM