How to extract overlapping phrases with nltk

Question

I'm using this code to extract phrase with {NN IN NN} but the result only shows "relationship between weight" but I wish to have "weight of beef" as well.

text = "what is the relationship between weight of beef and cooking time"

tokens = nltk.word_tokenize(text)
tagged_text = nltk.pos_tag(tokens)

grammar = r"""
    NP:{<NN><IN><NN>}
"""

cp = nltk.RegexpParser(grammar) 
result = cp.parse(tagged_text)
print(result)

(S
  what/WP
  is/VBZ
  the/DT
  (NP relationship/NN between/IN weight/NN)
  of/IN
  beef/NN
  and/CC
  cooking/NN
  time/NN)

Answer 1

If you define a grammar, then each word can only belong to one phrase and can't be overlapped. But since you only need to extract phrases, you can simply filter all the trigrams with the tags you want:

for (w1, t1), (w2, t2), (w3, t3) in nltk.trigrams(tagged_text):
    if t1 == 'NN' and t2 == 'IN' and t3 == 'NN':
        print(w1, w2, w3)

Outputs:

relationship between weight
weight of beef

How to extract overlapping phrases with nltk

Question

1 answers

solution1
0 2020-08-04 02:09:53

How to extract overlapping phrases with nltk

Question

1 answers

solution1 0 2020-08-04 02:09:53

solution1
0 2020-08-04 02:09:53