[英]How to convert a nltk tree (Stanford) into newick format in python?
我有這棵斯坦福樹,我想將其轉換為 newick 格式。
(ROOT
(S
(NP (DT A) (NN friend))
(VP
(VBZ comes)
(NP
(NP (JJ early))
(, ,)
(NP
(NP (NNS others))
(SBAR
(WHADVP (WRB when))
(S (NP (PRP they)) (VP (VBP have) (NP (NN time))))))))))
可能有一些方法可以僅使用字符串處理來做到這一點,但我會解析它們並以遞歸方式以 newick 格式打印它們。 一個最小的實現:
import re
class Tree(object):
def __init__(self, label):
self.label = label
self.children = []
@staticmethod
def _tokenize(string):
return list(reversed(re.findall(r'\(|\)|[^ \n\t()]+', string)))
@classmethod
def from_string(cls, string):
tokens = cls._tokenize(string)
return cls._tree(tokens)
@classmethod
def _tree(cls, tokens):
t = tokens.pop()
if t == '(':
tree = cls(tokens.pop())
for subtree in cls._trees(tokens):
tree.children.append(subtree)
return tree
else:
return cls(t)
@classmethod
def _trees(cls, tokens):
while True:
if not tokens:
raise StopIteration
if tokens[-1] == ')':
tokens.pop()
raise StopIteration
yield cls._tree(tokens)
def to_newick(self):
if self.children and len(self.children) == 1:
return ','.join(child.to_newick() for child in self.children)
elif self.chilren:
return '(' + ','.join(child.to_newick() for child in self.children) + ')'
else:
return self.label
請注意,當然,信息在轉換過程中會丟失,因為只保留了終端節點。 用法:
>>> s = """(ROOT (..."""
>>> Tree.from_string(s).to_newick()
...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.