简体   繁体   中英

What am I missing when getting nouns from sentence and reversed sentence using nltk?

I Have a is_noun definition using nltk :

is_noun = lambda pos: pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'

then I have this in a function:

def test(text):
    tokenized = nltk.word_tokenize(text)
    nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)]  
    print ('Nouns:', nouns)
    return nouns

then I call the function:

test('When will this long and tedious journey ever end? Like all good')

and get:

Nouns: ['journey']

then call same function but with reversed sentence and get:

test('good all Like end? ever journey tedious and long this will When')

results:

  Nouns: ['end']

I am expecting to get same amount of nouns but that is not the case. What am I doing wrong?

Summary: GIGO (Garbage In => Garbage Out).

As the comment suggests, word order matters. English is rife with words that can act as multiple parts of speech, depending on placement within a phrase. Consider:

You can cage a swallow.
You cannot swallow a cage.

In the second text you present, you do not have a legal sentence by any means. The best the English parser can determine is that "end" may be the direct object of the verb "like", and is therefore a noun in this case. Similarly, "journey" appears to be the main verb of the second sequence of words.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM