Chinking using NLTK in Python

Question

I have been trying out some of the examples in the Python NLTK Book . For example, Chapter 7 talks about Chinking with this example:

grammar = r"""
    NP:
    {<.*>+}          # Chunk everything
    }<VBD|IN>+{      # Chink sequences of VBD and IN
  """
sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),
       ("dog", "NN"), ("barked", "VBD"), ("at", "IN"),  ("the", "DT"), ("cat", "NN")]
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)

According to me, this supposed to excise "barked at" from the result. But it doesn't. I am new to python and nltk, but what am I missing here? Is there something obvious which needs to be updated here? Thanks..

Answer 1

chunking creates chunks, while chinking breaks up those chunks.

That's exactly what says "Python Text Processing with NLTK 2.0 Cookbook" by Jacob Perkins (I suggest you this book as you're new to NLTK).

That means that {} creates some chunks and }{ breaks up those chunks into smaller ones (ie separates them) but does NOT remove anything.

According to you example, check out what shows

result.draw()

or alternatively run

from nltk.tree import Tree

Tree('S', [Tree('NP', [('the', 'DT'), ('little', 'JJ'), ('yellow', 'JJ'), ('dog', 'NN')]), ('barked', 'VBD'), ('at', 'IN'), Tree('NP', [('the', 'DT'), ('cat', 'NN')])]).draw()

(the above code samples show the same thing. the difference is that the first requires you initial example to run while the second does not require anything)

Chinking using NLTK in Python

Question

1 answers

solution1
0 2012-12-21 07:29:13

Chinking using NLTK in Python

Question

1 answers

solution1 0 2012-12-21 07:29:13

solution1
0 2012-12-21 07:29:13