简体   繁体   English

在NLTK中交换树的叶子标签

[英]Swap leaf label of Tree in NLTK

I have parse this tagged sentence using NLTK's RegexpParser : the dog chased the black cat and I used the following grammar : 我已经使用NLTK的RegexpParser解析了这个带标签的句子: 狗追了黑猫 ,我使用了以下grammar

tagged_ = [('the', 'DT'), ('dog', 'NN'), ('chased', 'VBD'), ('the', 'DT'), ('black', 'JJ'), ('cat', 'NN')]

grammar = """NP: {<DT>?<JJ>*<NN>} VP: {<MD>?<VBD>}""" cp = nltk.RegexpParser(grammar) result = cp.parse(tagged_) print(result) result.draw()

this is the output of the print(result) and result.draw() : 这是print(result)result.draw()

(S (NP the/DT dog/NN) (VP chased/VBD) (NP the/DT black/JJ cat/NN)) 树

Now I want to reorder the leaves wherein the (VP chased/VBD) and (NP the/DT dog/NN) exchanged to be like this: 现在我想重新排序其中(VP chased/VBD)(NP the/DT dog/NN)交换的叶子,如下所示:

S (VP chased/VBD) (NP the/DT dog/NN) (NP the/DT black/JJ cat/NN)) then display the ['chased','the','dog','the','black','cat'] . S (VP chased/VBD) (NP the/DT dog/NN) (NP the/DT black/JJ cat/NN))然后显示['chased','the','dog','the','black','cat'] Is there any way? 有什么办法吗?

You can consider a nltk.Tree object as a tuple of two values. 您可以将nltk.Tree对象视为两个值的元组。 The first value is the name of the root node and the second value is a list that contains child trees or leaves. 第一个值是根节点的名称,第二个值是包含子树或叶子的列表。 You can build a complex tree by append child trees in the list of the root: 您可以通过在根列表中追加子树来构建复杂的树:

>>> from nltk import Tree
>>> tree = Tree('S', [])
>>> np = Tree('NP', ['The', 'dog'])
>>> tree.append(np)
>>> vp = Tree('VP', ['barks'])
>>> tree.append(vp)
>>> print tree
(S (NP the dog) (VP barks))

You can iterate over all sub trees by tree.subtrees() : 您可以通过tree.subtrees()遍历所有子树:

>>> for sub in tree.subtrees():
...     print sub
(S (NP the dog) (VP barks) 
(NP the dog)
(VP barks)

How you can see the method outputs all sub trees, ie in a complex tree you get sub trees, sub sub trees, sub sub sub trees... So in your case you should better gain access by slices of the first tree level: 您如何看到该方法输出所有子树,即在复杂树中获得子树,子子树,子子子树...因此,在您的情况下,您最好通过第一棵树级别的切片来获得访问权限:

>>> new = Tree('S', [])
>>> for i in xrange(len(tree)):
...     if tree[i].label() == 'VP':
...         new.insert(0, tree[i])
...     else:
...         new.append(tree[i])

>>> print new
(S (VP barks) (NP the dog))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM