Clustering synonym words using NLTK and Wordnet

Question

Given a set of words V , I would like to group the synonym words in V together. I am wondering if there is any built-in function in NLTK and Wordnet that takes V as the input and automatically cluster them based on synonymity.

I already know how to extract the synonym of each word, but this is not what I am looking for. If I do so, the problem becomes complicated when the synonym sets are intersecting each other, or being subset/superset of each other, which needs writing a function removing the conflicts.

As an example, let's consider

V = ["good","constipate","bad","nice","defective","right","respectable","powerful"]

What I want to get as output is:

[('constipate'), ('nice'), ('bad', 'defective'), ('good', 'powerful', 'respectable', 'right')]

Now based on the size/number of the clusters, some sets might split into several sets, or combine together. Here, I am just caring for the words in V and their synonyms in V .

Answer 1

Yes, there is a way to do using nltk and wordnet . Following is an example. I am using built in sysnets and looking for synonyms for a 'book',

import nltk
from nltk.corpus import wordnet 

synonyms = []

for syn in wordnet.synsets('book'):
        for lemma in syn.lemmas():
            synonyms.append(lemma.name())

resulting synonyms for 'book' is

print(synonyms)
>>['book', 'book', 'volume', 'record', 'record_book', 'book', 'script', 'book', 'playscript', 'ledger', 'leger', 'account_book', 'book_of_account', 'book', 'book', 'book', 'rule_book', 'Koran', 'Quran', "al-Qur'an", 'Book', 'Bible', 'Christian_Bible', ..]

length of synonyms,

 len(synonyms)
 >>38

Note: Some synonyms are verb forms, and many synonyms are just different usages of 'book'. If, instead, we take the set of synonyms, there are fewer unique words, as shown in the following code:

len(set(synonyms)) 
 >>25

After using set operation,

{'record', 'Quran', 'Holy_Scripture', 'Koran', 'Good_Book', 'playscript', 'book', 'Word_of_God', 'hold', 'Holy_Writ', 'script', 'leger', 'book_of_account', 'Scripture', 'ledger', 'reserve', 'volume', 'record_book', "al-Qur'an", 'Christian_Bible', 'Word', 'rule_book', 'Bible', 'Book', 'account_book'}

Clustering synonym words using NLTK and Wordnet

Question

1 answers

solution1
0 2017-12-11 16:57:48

Clustering synonym words using NLTK and Wordnet

Question

1 answers

solution1 0 2017-12-11 16:57:48

solution1
0 2017-12-11 16:57:48