[英]How to flatten the parse tree and store in a string for further string operations python nltk
Python nltk 提供了用于树操作和节点提取的函数
from nltk.tree import Tree
for tr in trees:
tr1 = str(tr)
s1 = Tree.fromstring(tr1)
s2 = s1.productions()
您可以使用 str 函数将树转换为字符串,然后按如下方式拆分和连接:
parse_string = ' '.join(str(tree).split())
print parse_string
该文档提供了一种pprint()
方法,可将树展平为一行。
解析这句话:
string = "My name is Ross and I am cool. What's going on world? I'm looking for friends."
然后调用pprint()
产生以下结果:
u"(NP+SBAR+S\n (S\n (NP (PRP$ my) (NN name))\n (VP\n (VBZ is)\n (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.))\n (SBAR\n (WHNP (WP What))\n (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world)))))\n (. ?))\n (S\n (NP (PRP I))\n (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends))))\n (. .)))"
从这一点来看,如果您希望删除制表符和换行符,您可以使用以下split
和join
(请参见此处) :
splitted = tree.pprint().split()
flat_tree = ' '.join(splitted)
执行这对我来说是这样的:
u"(NP+SBAR+S (S (NP (PRP$ my) (NN name)) (VP (VBZ is) (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.)) (SBAR (WHNP (WP What)) (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world))))) (. ?)) (S (NP (PRP I)) (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends)))) (. .)))"
NLTK 提供了立即执行此操作的功能:
flat_tree = tree._pformat_flat("", "()", False)
tree.pprint()
和str(tree)
都会在内部调用此方法,但如果需要,会添加额外的逻辑以将其拆分为多行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.