简体   繁体   中英

I am having problems doing Word Sense Disambiguation in Python using Lesk algorithm

I am new to Python and NLTK so please bear with me. I wish to find the sense of a word in the context of a sentence. I am using the Lesk WSD algorithm but it is giving different outputs every time I run it. I know that Lesk has some level of inaccuracy. But, I think a POS tag will increase accuracy.

The Lesk algorithm takes a POS tag as an argument, but it takes 'n','s','v' as an input and not 'NN','VBP' or other POS tags which are outputted by the pos_tag() function. I would like to know how to tag words in the form of 'n','s','v', or if there is a method in which I can convert the 'NN','VBP' and other tags into 'n','s','v', so I can give them as an input to the lesk(context_sentence,word,pos_tag) function.

I am calculating the sentiment score of every word using SentiWordNet afterwards.

    from nltk.wsd import lesk
    from nltk import word_tokenize
    import nltk, re, pprint
    from nltk.corpus import sentiwordnet as swn

    def word_sense():

        sent = word_tokenize("He should be happy.")
        word = "be"
        pos = "v"
        score = lesk(sent,word,pos)
        print(score)
        print (str(score),type(score))
        set1 = re.findall("'([^']*)'",str(score))[0]
        print (set1)
        bank = swn.senti_synset(str(set1))
        print (bank)

    word_sense()

nltk.wsd.lesk does not return score, it returns the predicted Synset :

>>> from nltk.corpus import wordnet as wn
>>> from nltk.corpus import sentiwordnet as swn
>>> from nltk import word_tokenize
>>> from nltk.wsd import lesk
>>> sent = word_tokenize("He should be happy".lower())
>>> lesk(sent, 'be', 'v')
Synset('equal.v.01')

lesk is not perfect, it should only be used as a baseline system for WSD.

Although this is nice:

>>> ss = str(lesk(sent, 'be', 'v'))
>>> re.findall("'([^']*)'",ss)
['equal.v.01']

There's a simpler to get the synset identifier:

>>> lesk(sent, 'be', 'v').name()
u'equal.v.01'

Then you can do:

>>> swn.senti_synset(lesk(sent, 'be', 'v').name())
SentiSynset('equal.v.01')

To convert POS tag to WN POS , you can simply try: Converting POS tags from TextBlob into Wordnet compatible inputs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM