简体   繁体   中英

transforming wordnet txt into lists in python nltk

I am running the following function:

import nltk
from nltk.corpus import wordnet as wn

def noun_names(list): 
    for synset in list:
        for lemma in synset.lemmas():
            print lemma.name()

noun_names(list(wn.all_synsets(wn.NOUN)))

and it returns a long list of all the names of nouns in wordnet:

eg

epoch
Caliphate
Christian_era
Common_era
day
year_of_grace
Y2K
generation
anniversary

How do I take this output, which is neither a string or a list, and turn it into a list? Thanks so much.

Instead of printing to the stdout with your:

print lemma.name()

Why not append it to a list and return the list?

def noun_names(list):
    names = []
    for synset in list:
        for lemma in synset.lemmas():
            names.append(lemma.name())
    return names

names = noun_names(list(wn.all_synsets(wn.NOUN)))

It's not returning anything. Your function is printing , not returning.

You need to return a list. As a side note, you should rename your function parameter from list . You'll clobber something unintentionally with that.

One option is to modify your function slightly to append to a list and then return that:

def noun_names(word_list):
    lemma_list = []
    for synset in word_list:
        for lemma in synset.lemmas():
            lemma_list.append(lemma.name())
    return lemma_list

Another option is to change the above into a list comprehension:

def noun_names(word_list): 
    return [lemma.name() for synset in word_list for lemma in synset.lemmas()]

Both of these function return a list with the same information. Notice that I removed the list() function wrapper around wn.all_synsets(wn.NOUN) since the function returns a list .

lemma_list1 = noun_names(wn.all_synsets(wn.NOUN))
lemma_list2 = noun_names_1(wn.all_synsets(wn.NOUN))
print len(lemma_list1), len(lemma_list2), len(lemma_list1) == len(lemma_list2), lemma_list1 == lemma_list2

That final print statement outputs:

146347 146347 True True

This shows that both lists have the same number of elements ( 146347 each and the first True ) and the lists themselves are equal. A more appropriate test in the code is:

assert len(lemma_list1) == len(lemma_list2)
assert lemma_list1 == lemma_list2

If the lists aren't the same length or equal the assert statement will throw an exception.

If you just need the list of lemma, check out OMW (Open Multilingual WordNet) http://compling.hss.ntu.edu.sg/omw/

$ wget http://compling.hss.ntu.edu.sg/omw/wns/eng.zip
$ unzip eng.zip
$ cut -f3 eng/wn-data-eng.tab | (read;cat)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM