简体   繁体   中英

How to extract overlapping phrases with nltk

I'm using this code to extract phrase with {NN IN NN} but the result only shows "relationship between weight" but I wish to have "weight of beef" as well.

text = "what is the relationship between weight of beef and cooking time"

tokens = nltk.word_tokenize(text)
tagged_text = nltk.pos_tag(tokens)

grammar = r"""
    NP:{<NN><IN><NN>}
"""

cp = nltk.RegexpParser(grammar) 
result = cp.parse(tagged_text)
print(result)
(S
  what/WP
  is/VBZ
  the/DT
  (NP relationship/NN between/IN weight/NN)
  of/IN
  beef/NN
  and/CC
  cooking/NN
  time/NN)

If you define a grammar, then each word can only belong to one phrase and can't be overlapped. But since you only need to extract phrases, you can simply filter all the trigrams with the tags you want:

for (w1, t1), (w2, t2), (w3, t3) in nltk.trigrams(tagged_text):
    if t1 == 'NN' and t2 == 'IN' and t3 == 'NN':
        print(w1, w2, w3)

Outputs:

relationship between weight
weight of beef

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM