简体   繁体   English

使用周围单词的python word2vec上下文相似度

[英]python word2vec context similarity using surrounding words

I would like to use embeddings made by w2v in order to obtain the most likely substitute words GIVEN a context (surrounding words), rather than supplying an individual word. 我想使用w2v制作的嵌入来获得最可能的替代词GIVEN一个上下文(周围的单词),而不是提供一个单独的单词。

Example: sentence = 'I would like to go to the park tomorrow after school' 例如:句子='我想明天放学后去公园'

If I want to find candidates similar to "park", typically I would just leverage the similarity function from the Gensim model 如果我想找到类似于“park”的候选者,通常我会利用Gensim模型中的相似度函数

model.most_similar('park')

and obtain semantically similar words. 并获得语义相似的单词。 However this could give me similar words to the verb 'park' instead of the noun 'park', which I was after. 然而,这可能会给我类似于动词'park'而不是名词'park',这是我追求的。

Is there any way to query the model and give it surrounding words as context to provide better candidates? 有没有办法查询模型,并将周围的单词作为上下文提供更好的候选人?

Word2vec is not, primarily, a word-prediction algorithm. Word2vec主要不是字预测算法。 Internally it tries to do semi-predictions, to train its word-vectors, but usually these training-predictions aren't the end-use for which word-vectors are wanted. 在内部,它试图进行半预测,以训练其单词向量,但通常这些训练预测并不是需要单词向量的最终用法。

That said, recent versions of gensim added a predict_output_word() method that (for some model modes) approximates the predictions done during training. 也就是说,最新版本的gensim添加了一个predict_output_word()方法(对于某些模型模式)近似于训练期间完成的预测。 It might be useful for your purposes. 它可能对您的目的有用。

Alternatively, checking for the words most_similar() to your initial target word that are also somewhat-similar to the context words might help. 或者,检查单词most_similar()到您的初始目标单词,这些单词与上下文单词有些相似可能会有所帮助。

There have been some research papers about ways to disambiguate multiple word senses (like 'to /park/ a car' versus 'walk in a /park/') during word-vector training, but I haven't seen them implemented in open source libraries. 有一些研究论文关于在单词矢量训练中消除多个词义的歧义(比如'to / park / a'与'walk in a / park /'),但我还没有看到它们在开源中实现库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM