简体   繁体   中英

python word2vec context similarity using surrounding words

I would like to use embeddings made by w2v in order to obtain the most likely substitute words GIVEN a context (surrounding words), rather than supplying an individual word.

Example: sentence = 'I would like to go to the park tomorrow after school'

If I want to find candidates similar to "park", typically I would just leverage the similarity function from the Gensim model

model.most_similar('park')

and obtain semantically similar words. However this could give me similar words to the verb 'park' instead of the noun 'park', which I was after.

Is there any way to query the model and give it surrounding words as context to provide better candidates?

Word2vec is not, primarily, a word-prediction algorithm. Internally it tries to do semi-predictions, to train its word-vectors, but usually these training-predictions aren't the end-use for which word-vectors are wanted.

That said, recent versions of gensim added a predict_output_word() method that (for some model modes) approximates the predictions done during training. It might be useful for your purposes.

Alternatively, checking for the words most_similar() to your initial target word that are also somewhat-similar to the context words might help.

There have been some research papers about ways to disambiguate multiple word senses (like 'to /park/ a car' versus 'walk in a /park/') during word-vector training, but I haven't seen them implemented in open source libraries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM