[英]Creating a full nltk parse tree from a list of nltk subtrees in python 3.5
I have list of subtrees which I derived from a parse history formatted as follows: 我有一个子树列表,这些子树是我从解析历史记录中导出的,其格式如下:
The parse history: 解析历史:
parse = [('S', 0), ('NP', 1), ('Det', 0), ('N', 0), ('VP', 1), ('V', 4), ('NP', 2), ('NP', 0), ('PN', 1), ('NP', 1), ('Det', 0), ('N', 3)]
Each tuple in the list has a key to a grammar dictionary which contains a list of rules. 列表中的每个元组都有一个包含规则列表的语法字典的键。 The second item in the tuple is the index of the rule for that given key.
元组中的第二项是该给定键的规则的索引。
The grammar is: 语法是:
grammar = {'S': [['NP', 'VP']],
'NP': [['PN'], ['Det', 'N']],
'VP': [['V'], ['V', 'NP', 'NP']],
'PN': [['John'], ['Mary'], ['Bill']],
'Det': [['the'], ['a']],
'N': [['man'], ['woman'], ['drill sergeant'], ['dog']],
'V': [['slept'], ['cried'], ['assaulted'],
['devoured'], ['showed']]}
The list of subtrees is: 子树的列表是:
[Tree('S', ['NP', 'VP']), Tree('NP', ['Det', 'N']), Tree('Det', ['the']), Tree('N', ['man']), Tree('VP', ['V', 'NP', NP]), Tree('V', ['showed']), Tree('NP', ['PN']), Tree('PN', ['Mary']), Tree('NP', ['Det', 'N']), Tree('Det', ['the']), Tree('N', ['dog'])]
I created the list of subtrees using the following code: 我使用以下代码创建了子树列表:
for item in parse:
apple = Tree(item[0], grammar[item[0]][item[1]])
trees.append(apple)
The output I got when I printed the trees (which I know isn't the correct method but it at least shows the subtrees) is as follows: 我打印树时得到的输出(我知道这不是正确的方法,但至少显示了子树),如下所示:
(S NP VP)
(NP Det N)
(Det the)
(N man)
(VP V NP)
(V showed)
(NP NP NP)
(NP PN)
(PN Mary)
(NP Det N)
(Det the)
(N dog)
Thanks for the help! 谢谢您的帮助!
::EDIT:: ::编辑::
The correct output should look like this: 正确的输出应如下所示:
(S(NP(Det the)(N man))(VP(V showed)(NP(PN Mary))(NP(Det the)(N dog))))
You need to recursively build the tree, but you need to distinguish between terminals and non-terminals. 您需要递归地构建树,但是需要区分终端和非终端。 Btw.
顺便说一句。 your parse sequence seems wrong.
您的解析顺序似乎是错误的。 I hacked this up:
我搞砸了:
def build_tree(parse):
assert(parse)
rule_head = parse[0][0]
rule_body = grammar[rule_head][parse[0][1]]
tree_body = []
rest = parse[1:]
for r in rule_body:
if non_term(r):
(subtree,rest) = build_tree(rest)
tree_body.append(subtree)
else:
tree_body.append(r)
return (tree(rule_head,tree_body), rest)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.