繁体   English   中英

如何在字典中恢复 Newick 树的内部节点?

[英]How to recover the internal nodes of a Newick tree in a dictionary?

我有以下 Newick 树:((((A,B)1,C)2,((((D,E)3,F)4,G)5,(((((H,I)6, J)7,K)8,L)9,M)10)11)12,N)13;"

其中字母是叶子,数字是内部节点。

我想得到以下字典

{1:('A','B'),2:('C',1),3:('D','E'),4:('F',3)....}

它将内部节点与其两个子节点相关联。

我在stackoverflow上找到了这段代码:

import re

def parse(newick):
   tokens = re.findall(r"([^;,()\s]*)(?:\s*\s*([\d.]+)\s*)?([,);])|(\S)", newick)
   def recurse():
       children = []
       name, length, delim, ch = tokens.pop(0)
       if ch == "(":
           while ch in "(,":
               node, ch = recurse()
               children.append(node)
           name, length, delim, ch = tokens.pop(0)
       return {"name": name,"children": children}, delim
   return recurse()[0]

但我不知道如何让它适应这个问题。

谢谢,

换行

return {"name": name,"children": children}, delim

通过只是

return {name: children}, delim

定义新函数:

def alternative_newick(treedata, zero_root=False):
    result = {}

    def build_node(parent, name, children):
        if parent == "":
            parent = 0
        if name.isnumeric():
            name = int(name)

        for child in children:
            if child:
                build_node(name, *child.popitem())

        if result.get(parent):
            result[parent].append(name)
        else:
            result[parent] = [name]

    tree = parse(treedata)
    build_node(None, *tree.popitem())
    
    result.pop(None)
    if not zero_root:
        result.pop(0)
    for k, v in result.items():
        result[k] = tuple(v)

    return result

并使用喜欢

treedata = "(((A,B)1,C)2,((((D,E)3,F)4,G)5,(((((H,I)6,J)7,K)8,L)9,M)10)11)12"
print(alternative_newick(treedata))

结果将是

{1: ('A', 'B'), 2: (1, 'C'), 3: ('D', 'E'), 4: (3, 'F'), 5: (4, 'G'), 11: (5, 10), 6: ('H', 'I'), 7: (6, 'J'), 8: (7, 'K'), 9: (8, 'L'), 10: (9, 'M')}

整个代码在这里

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM