I have parse this tagged sentence using NLTK's RegexpParser
: the dog chased the black cat and I used the following grammar
:
tagged_ = [('the', 'DT'), ('dog', 'NN'), ('chased', 'VBD'), ('the', 'DT'), ('black', 'JJ'), ('cat', 'NN')]
grammar = """NP: {<DT>?<JJ>*<NN>} VP: {<MD>?<VBD>}""" cp = nltk.RegexpParser(grammar) result = cp.parse(tagged_) print(result) result.draw()
this is the output of the print(result)
and result.draw()
:
(S (NP the/DT dog/NN) (VP chased/VBD) (NP the/DT black/JJ cat/NN))
Now I want to reorder the leaves wherein the (VP chased/VBD)
and (NP the/DT dog/NN)
exchanged to be like this:
S (VP chased/VBD) (NP the/DT dog/NN) (NP the/DT black/JJ cat/NN))
then display the ['chased','the','dog','the','black','cat']
. Is there any way?
You can consider a nltk.Tree object as a tuple of two values. The first value is the name of the root node and the second value is a list that contains child trees or leaves. You can build a complex tree by append child trees in the list of the root:
>>> from nltk import Tree
>>> tree = Tree('S', [])
>>> np = Tree('NP', ['The', 'dog'])
>>> tree.append(np)
>>> vp = Tree('VP', ['barks'])
>>> tree.append(vp)
>>> print tree
(S (NP the dog) (VP barks))
You can iterate over all sub trees by tree.subtrees()
:
>>> for sub in tree.subtrees():
... print sub
(S (NP the dog) (VP barks)
(NP the dog)
(VP barks)
How you can see the method outputs all sub trees, ie in a complex tree you get sub trees, sub sub trees, sub sub sub trees... So in your case you should better gain access by slices of the first tree level:
>>> new = Tree('S', [])
>>> for i in xrange(len(tree)):
... if tree[i].label() == 'VP':
... new.insert(0, tree[i])
... else:
... new.append(tree[i])
>>> print new
(S (VP barks) (NP the dog))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.