Using this [x for x in wn.all_synsets('n')]
I am able to get a list allnouns
with all nouns from Wordnet with help from NLTK.
The list allnouns
looks like this Synset('pile.n.01'), Synset('compost_heap.n.01'), Synset('mass.n.03')
and so on. Now I am able to get any element by using allnouns[2]
and this should be Synset('mass.n.03')
.
I would like to extract only the word mass but for some reason I cannot treat it like a string and everything I try shows a AttributeError: 'Synset' object has no attribute
or TypeError: 'Synset' object is not subscriptable
or <bound method Synset.name of Synset('mass.n.03')>
if I try to use .name or .pos
How about trying this solution:
>>>> from nltk.corpus import wordnet as wn
>>>> wn.synset('mass.n.03').name().split(".")[0]
'mass'
For your case:
>>>> allnouns = [x for x in wn.all_synsets('n')]
The item at 23rd index is "Synset('substance.n.07')". Now, you can extract its name field like
>>>> allnouns[23].name().split(".")[0]
'substance' #output
If you want only the 'name' fields of the synsets of 'noun' category in the list, then use:
>>>> [x.name().split(".")[0] for x in wn.all_synsets('n')]
should exactly give the result you need.
Note: In wordnet, name
is not an attribute rather it is a function!
Using Synset.names()
to get the canonical lemma name of the synset:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('mass', 'n')
[Synset('mass.n.01'), Synset('batch.n.02'), Synset('mass.n.03'), Synset('mass.n.04'), Synset('mass.n.05'), Synset('multitude.n.03'), Synset('bulk.n.02'), Synset('mass.n.08'), Synset('mass.n.09')]
>>> wn.synsets('mass', 'n')[0]
Synset('mass.n.01')
>>> wn.synsets('mass', 'n')[0].name()
u'mass.n.01'
>>> wn.synsets('mass', 'n')[0].name().split('.')[0]
u'mass'
But do note that sometimes a synset is made up of several lemmas, so you should use Synset.lemma_names()
to access all lemmas if you're using the surface word form of a synset:
>>> wn.synsets('mass', 'n')[0].lemmas()
[Lemma('mass.n.01.mass')]
>>> wn.synsets('mass', 'n')[0].lemma_names()
[u'mass']
>>> wn.synsets('mass', 'n')[0].definition()
u'the property of a body that causes it to have weight in a gravitational field'
In the wn.synsets('mass', 'n')[0]
case there's only 1 lemma attached to the synset. But sometimes there's more than one, eg
>>> wn.synsets('mass', 'n')[1].lemma_names()
[u'batch', u'deal', u'flock', u'good_deal', u'great_deal', u'hatful', u'heap', u'lot', u'mass', u'mess', u'mickle', u'mint', u'mountain', u'muckle', u'passel', u'peck', u'pile', u'plenty', u'pot', u'quite_a_little', u'raft', u'sight', u'slew', u'spate', u'stack', u'tidy_sum', u'wad']
>>> wn.synsets('mass', 'n')[1].definition()
u"(often followed by `of') a large number or amount or extent"
And to exact all list of words in wordnet, you can try:
>>> from itertools import chain
>>> set(chain(*[i.lemma_names() for i in wn.all_synsets('n')]))
>>> len(set(chain(*[i.lemma_names() for i in wn.all_synsets('n')])))
119034
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.