简体繁体 English

使用wordnet查找具有6种基本情绪的句子的相似性

[英]Find similarity of a sentence with 6 basic emotions using wordnet

原文 2016-01-23 13:35:51 1 1 nltk/ wordnet/ emotion/ senti-wordnet

i'm working on a project and a part of it needs to detect emotion of the text we work on. 我正在研究一个项目，其中一部分需要检测我们正在研究的文本的情感。

For example, 例如，

He is happy to go home. 他很高兴回家。

I'll be taking two words from the above sentence ie happy and home. 我将从以上句子中选择两个词，即“快乐”和“回家”。

I'll be having a table containing 6 basic emotions. 我将有一张包含6种基本情绪的表格。 ( Happy, Sad, fear,anger,disgust, suprise) （快乐，悲伤，恐惧，愤怒，厌恶，惊奇）

Each of these emotions will be having some synsets associated with them. 这些情绪中的每一种都会有一些与它们相关的同义词。

I need to find the similarity between these synsets and the word happy and then similarity between these synsets and the word home. 我需要找到这些同义词集和单词happy之间的相似性，然后找到这些同义词集和home之间的相似性。

I tried to use WORDNET for this purpose but couldn't able to understand how wordnet works as i'm new to this. 我试图为此目的使用WORDNET，但由于我对此并不陌生，因此无法理解wordnet的工作原理。

1 个解决方案

I think you want to find words in sentence that are similar to any of the words that represent any of the 6 basic given emotions. 我认为您想在句子中找到与代表6种基本给定情感中的任何一种的单词相似的单词。 If I am correct I think you can use following solution. 如果我没错，我认为您可以使用以下解决方案。

First extract synset of each of the word sense representing 6 basic emotions. 首先提取代表6种基本情感的每个词义的同义词集。 Now form the vectorized representation of each of these synset(collection of synonymous words). 现在形成每个同义词集的向量化表示（同义单词的集合）。 You can do this using word2Vec tool available at https://code.google.com/archive/p/word2vec/ . 您可以使用https://code.google.com/archive/p/word2vec/上的 word2Vec工具来执行此操作。 eg 例如

Suppose "happy" has the word senses a1, a2, a3 as its synonymous words then 1. First train Word2Vec tool on any large English Corpus eg Bojar corpus 2. Then using trained word2Vec obtain word embeddings(vectorized representation) of each synonymous word a1, a2, a3. 假设“ happy”一词具有a1，a2，a3的同义词，则为1。首先在任何大型英语语料库（例如Bojar语料库）上训练Word2Vec工具，然后使用经过训练的word2Vec获得每个同义词a1的词嵌入（矢量化表示），a2，a3。 3. Then vectorized representation of synset of "happy" would be average of vectorized representation of a1, a2, a3. 3.然后，“快乐”的同义词集的矢量化表示将是a1，a2，a3的矢量化表示的平均值。 4. In this way you can have vectorized representation synset of each of the 6 basic emotion. 4.通过这种方式，您可以对6种基本情绪中的每一种进行矢量化表示同义词集。

Now for given sentence find vectorized representation of each of the word in using trained word2vec generated vocabulary. 现在，对于给定的句子，使用经过训练的word2vec生成的词汇表，可以找到每个单词的矢量化表示形式。 Now you can use cosine similarity ( https://en.wikipedia.org/wiki/Cosine_similarity ) to find distance(similarity) of each of the word from synset of 6 basic emotions. 现在，您可以使用余弦相似度（ https://en.wikipedia.org/wiki/Cosine_similarity ）从6种基本情绪的同义词集中找出每个单词的距离（相似度）。 In this way you can determine emotion(basic level) of the sentence. 这样，您可以确定句子的情感（基本水平）。

Source of the technique : Research paper "Unsupervised Most Frequent Sense Detection using Word Embeddings" by Sudha et. 该技术的来源：Sudha等人的研究论文“使用词嵌入的无监督的最频繁感检测”。 al.( http://www.aclweb.org/anthology/N15-1132 ) 等（ http://www.aclweb.org/anthology/N15-1132 ）