简体   繁体   中英

Word sense disambiguation with WordNet. How to select the words related to the same meaning?

I am using WordNet and NLTK for the word sense disambiguation. I am interested in all the words, which are related to the sound. I have a list of such words and 'roll' is one of them. Then I check if any of my sentences contains this word (I also check it depending on the POS). And if yes I would like to select only such sentences, which are related to sound. In the example below it would be the second sentence. The idea I have now is just to select such words, whos definition has a word 'sound' in it as 'the sound of a drum (especially a snare drum) beaten rapidly and continuously'. But I suspect that there is a more elegant way. Any ideas would be highly appreciated!

from nltk.wsd import lesk
from nltk.corpus import wordnet as wn

samples = [('The van rolled along the highway.','n'),
('The thunder rolled and the lightning striked.','n')]

word = 'roll'
for sentence, pos_tag in samples:
    word_syn = lesk(word_tokenize(sentence.lower()), word, pos_tag)
    print 'Sentence:', sentence
    print 'Word synset:', word_syn
    print  'Corresponding definition:', word_syn.definition()

output:

Sentence: The van rolled along the highway.
Word synset: Synset('scroll.n.02')
Corresponding definition: a document that can be rolled up (as for storage)
Sentence: The thunder rolled and the lightning striked.
Word synset: Synset('paradiddle.n.01')
Corresponding definition: the sound of a drum (especially a snare drum) beaten rapidly and continuously

You could use WordNet hypernyms (synsets with a more general meaning). My first idea would be to go from the current synset upwards (using synset.hypernyms() ) and keep checking whether I find the "sound" synset. When I hit the root (which has no hypernyms, ie synset.hypernyms() returns an empty list), I would stop.

Now for your two examples, this produces the following sequences of synsets:

Sentence:The van rolled along the highway .
Word synset:Synset('scroll.n.02')
[Synset('manuscript.n.02')]
[Synset('autograph.n.01')]
[Synset('writing.n.02')]
[Synset('written_communication.n.01')]
[Synset('communication.n.02')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]

Sentence:The thunder rolled and the lightning striked .
Word synset:Synset('paradiddle.n.01')
[Synset('sound.n.04')]
[Synset('happening.n.01')]
[Synset('event.n.01')]
[Synset('psychological_feature.n.01')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]

So one of the synsets you might want to look for is sound.n.04 . But there could be others, I think you could play with other examples and try to come up with a list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM