[英]How to flatten the parse tree and store in a string for further string operations python nltk
Python nltk 提供了用於樹操作和節點提取的函數
from nltk.tree import Tree
for tr in trees:
tr1 = str(tr)
s1 = Tree.fromstring(tr1)
s2 = s1.productions()
您可以使用 str 函數將樹轉換為字符串,然后按如下方式拆分和連接:
parse_string = ' '.join(str(tree).split())
print parse_string
該文檔提供了一種pprint()
方法,可將樹展平為一行。
解析這句話:
string = "My name is Ross and I am cool. What's going on world? I'm looking for friends."
然后調用pprint()
產生以下結果:
u"(NP+SBAR+S\n (S\n (NP (PRP$ my) (NN name))\n (VP\n (VBZ is)\n (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.))\n (SBAR\n (WHNP (WP What))\n (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world)))))\n (. ?))\n (S\n (NP (PRP I))\n (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends))))\n (. .)))"
從這一點來看,如果您希望刪除制表符和換行符,您可以使用以下split
和join
(請參見此處) :
splitted = tree.pprint().split()
flat_tree = ' '.join(splitted)
執行這對我來說是這樣的:
u"(NP+SBAR+S (S (NP (PRP$ my) (NN name)) (VP (VBZ is) (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.)) (SBAR (WHNP (WP What)) (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world))))) (. ?)) (S (NP (PRP I)) (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends)))) (. .)))"
NLTK 提供了立即執行此操作的功能:
flat_tree = tree._pformat_flat("", "()", False)
tree.pprint()
和str(tree)
都會在內部調用此方法,但如果需要,會添加額外的邏輯以將其拆分為多行。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.