简体   繁体   中英

Using WordNet with nltk to find synonyms that make sense

I want to input a sentence, and output a sentence with hard words made simpler.

I'm using Nltk to tokenize sentences and tag words, but I'm having trouble using WordNet to find a synonym for the specific meaning of a word that I want.

For example:

Input: "I refuse to pick up the refuse "

Maybe refuse #1 is the easiest word for rejecting, but the refuse #2 means garbage, and there are simpler words that could go there.

Nltk might be able to tag refuse #2 as a noun, but then how do I get synonyms for refuse (trash) from WordNet?

Sounds like you want word synonyms based upon the part of speech of the word (ie noun, verb, etc.)

Follows creates synonyms for each word in a sentence based upon part of speech. References:

  1. Extract Word from Synset using Wordnet in NLTK 3.0
  2. Printing the part of speech along with the synonyms of the word

Code

import nltk; nltk.download('popular') 
from nltk.corpus import wordnet as wn

def get_synonyms(word, pos):
  ' Gets word synonyms for part of speech '
  for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
    for lemma in synset.lemmas():
        yield lemma.name()

def pos_to_wordnet_pos(penntag, returnNone=False):
   ' Mapping from POS tag word wordnet pos tag '
    morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
                  'VB':wn.VERB, 'RB':wn.ADV}
    try:
        return morphy_tag[penntag[:2]]
    except:
        return None if returnNone else ''

Example Usage

# Tokenize text
text = nltk.word_tokenize("I refuse to pick up the refuse")

for word, tag in nltk.pos_tag(text):
  print(f'word is {word}, POS is {tag}')

  # Filter for unique synonyms not equal to word and sort.
  unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))

  for synonym in unique:
    print('\t', synonym)

Output

Note the different sets of synonyms for refuse based upon POS.

word is I, POS is PRP
word is refuse, POS is VBP
     decline
     defy
     deny
     pass_up
     reject
     resist
     turn_away
     turn_down
word is to, POS is TO
word is pick, POS is VB
     beak
     blame
     break_up
     clean
     cull
     find_fault
     foot
     nibble
     peck
     piece
     pluck
     plunk
word is up, POS is RP
word is the, POS is DT
word is refuse, POS is NN
     food_waste
     garbage
     scraps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM