简体   繁体   中英

Chinking using NLTK in Python

I have been trying out some of the examples in the Python NLTK Book . For example, Chapter 7 talks about Chinking with this example:

grammar = r"""
    NP:
    {<.*>+}          # Chunk everything
    }<VBD|IN>+{      # Chink sequences of VBD and IN
  """
sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),
       ("dog", "NN"), ("barked", "VBD"), ("at", "IN"),  ("the", "DT"), ("cat", "NN")]
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)

According to me, this supposed to excise "barked at" from the result. But it doesn't. I am new to python and nltk, but what am I missing here? Is there something obvious which needs to be updated here? Thanks..

chunking creates chunks, while chinking breaks up those chunks.

That's exactly what says "Python Text Processing with NLTK 2.0 Cookbook" by Jacob Perkins (I suggest you this book as you're new to NLTK).

That means that {} creates some chunks and }{ breaks up those chunks into smaller ones (ie separates them) but does NOT remove anything.

According to you example, check out what shows

result.draw()

or alternatively run

from nltk.tree import Tree

Tree('S', [Tree('NP', [('the', 'DT'), ('little', 'JJ'), ('yellow', 'JJ'), ('dog', 'NN')]), ('barked', 'VBD'), ('at', 'IN'), Tree('NP', [('the', 'DT'), ('cat', 'NN')])]).draw()

(the above code samples show the same thing. the difference is that the first requires you initial example to run while the second does not require anything)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM