简体   繁体   English

计算特定格式(使用 Python)中的组数(使用特定 TAG)

[英]Count number of groups (with specific TAG) within a specific format (with Python)

Hello everyone I need some help :大家好,我需要一些帮助:

I do not know if you are familiar with phylogenetic tree but here is an exemple:我不知道你是否熟悉系统发育树,但这里有一个例子:

   /-YP_001604167.1
  |
  |--YP_001604351.1
--|
  |      /-seq_TAG2_Canis_taurus
  |   /-|
  |  |   \-seq_TAG2_Canis_austracus
   \-|
     |   /-YP_001798528.1
      \-|
        |   /-YP_009173671.1
         \-|
           |   /-seq_TAG1_Mus_musculus
            \-|
              |   /-seq_TAG1_Mus_griseus
               \-|
                 |   /-seq_TAG2_Canis_canis
                  \-|
                    |   /-seq_TAG2_Canis_familiaris
                     \-|
                        \-seq_TAG2_Canis_lupus

And this tree is coded by a specific format called newick :这棵树由一种称为 newick 的特定格式编码:

'(YP_001604167.1,YP_001604351.1,((seq_TAG2_Canis_austracus,seq_TAG2_Canis_taurus),(YP_001798528.1,(YP_009173671.1,(seq_TAG1_Mus_musculus,(seq_TAG1_Mus_griseus,(seq_TAG2_Canis_lupus,(seq_TAG2_Canis_familiaris,seq_TAG2_Canis_canis))))))));'
  • Explanation of the format:格式说明:

The tree ends with a semicolon.树以分号结尾。 The bottommost node in this tree is an interior node, not a tip.这棵树中最底部的节点是内部节点,而不是尖端。 Interior nodes are represented by a pair of matched parentheses.内部节点由一对匹配的括号表示。 Between them are representations of the nodes ( seq_names ) that are immediately descended from that node , separated by commas .它们之间是节点的表示( seq_names ),这些node直接从该node descended ,用commas

son if I have something like :儿子,如果我有类似的东西:

(A,(B,C)); 

Then it means that B and C are more closely related each other and A is the most distant.那么这意味着BC彼此之间的关系更密切, A是最远的。

And the idea of my question was to find a way using for instance python to count the number of groups with the same " TAG_number " that are more close to each other than any other TAG_number or YP_number nodes.我的问题的想法是找到一种方法,例如使用 python 来计算具有相同“ TAG_number ”的组的数量,这些组比任何其他TAG_numberYP_number节点更接近彼此。

For instance, the TAG2 in representated in 2 groups where (seq_TAG2_Canis_taurus, seq_TAG2_Canis_austracus) are together and the second group (seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus)) are together.例如, TAG22 groups中表示,其中(seq_TAG2_Canis_taurus, seq_TAG2_Canis_austracus)在一起,第二组(seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus))在一起。 For the TAG1 as you can see, none of them is nested together because seq_TAG1_Mus_griseus is more close to the group (seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus)) than it is with the other TAG1 seq_TAG1_Mus_musculus .对于TAG1 ,你可以看到,他们都不是嵌套在一起,因为seq_TAG1_Mus_griseus更接近组(seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus))比它与其他TAG1 seq_TAG1_Mus_musculus

So the result should be something like :所以结果应该是这样的:

groups for TAG_1 : 0 
groups for TAG_2 : 2 

I know that some packages in Python or R are available in order to tell if TAG_number are in " monophyletic groups " but there is nothing to tells the number of groups within the tree if TAG_number groups are splitted within the tree.我知道可以使用 Python 或 R 中的一些包来判断 TAG_number 是否在“ monophyletic groups ”中,但是如果TAG_number组在树中被TAG_number ,则没有什么可以告诉树中的组数。

If you have any idea in order to do that?如果您有任何想法可以做到这一点? Thank you very much.非常感谢。

Other part of the question :问题的其他部分:

Now I have a Species phylogeny such as :现在我有一个Species phylogeny例如:

|         /-Canis_taurus
|      /-|
|     |   \-Canis_astracus
|   /-|
|  |  |   /-Canis_africus
|  |   \-|
|  |     |   /-Canis_familiaris
 \-|      \-|
   |         \-Canis _lupus
   |
   |   /-Canis_canis
    \-|
       \-Lupus_lupus

and The idea is within each monophyletic groups assesed in the previous process, to count within clades formed by the MRCA of the clades in the species phylogeny the number of nodes.该想法是在先前过程中评估的每个monophyletic groups内,以计算由物种系统发育中进化枝的 MRCA 形成的进化枝内的节点数。

So I have 2 groups :所以我有2 groups

The first:首先:

#    /-TAG2, seq_TAG2_Canis_austracus
# --|
#    \-TAG2, seq_TAG2_Canis_taurus
#

Here Canis_austracus and Canis_taurus share a MRCA in the species phylogeny and this ancestor forms the clade composed by 2 species ( Canis_austracus and Canis_taurus )这里Canis_austracusCanis_taurus在物种系统发育中共享一个MRCA ,这个祖先形成了由2 speciesCanis_austracus and Canis_taurus )组成的进化枝

So Nb species within species phylogenetic tree = 2所以物种系统发育树中的 Nb 物种 = 2

#    /-TAG2, seq_TAG2_Canis_lupus
# --|
#   |   /-TAG2, seq_TAG2_Canis_familiaris
#    \-|
#       \-TAG2, seq_TAG2_Canis_canis

Here the 3 taxa share a MRCA and this ancestor forms the clade composed by all species in the species phylogeny (7)这里的 3 个分类群共享一个MRCA ,这个祖先形成了物种系统发育中所有物种组成的进化枝 (7)

So Nb species within species phylogenetic tree = 7所以物种系统发育树中的 Nb 物种 = 7

Maybe get_monophyletic of ete3 is what you need?也许 ete3 的 get_monophyletic 是你所需要的? http://etetoolkit.org/docs/latest/reference/reference_tree.html?highlight=get_monophyletic#ete3.TreeNode.get_monophyletic http://etetoolkit.org/docs/latest/reference/reference_tree.html?highlight=get_monophyletic#ete3.TreeNode.get_monophyletic

from ete3 import Tree import re从 ete3 导入树导入重新

# build tree
t = Tree("(YP_001604167.1,YP_001604351.1,"
         "((seq_TAG2_Canis_austracus,seq_TAG2_Canis_taurus),"
         "(YP_001798528.1,(YP_009173671.1,(seq_TAG1_Mus_musculus,"
         "(seq_TAG1_Mus_griseus,(seq_TAG2_Canis_lupus,"
         "(seq_TAG2_Canis_familiaris,seq_TAG2_Canis_canis))))))));")

# set tag as leave attribute
for leaf in t:
    # get tag from name
    tag = re.search('TAG[0-9]', leaf.name)
    tag = tag.group(0) if tag else None
    leaf.add_features(tag=tag)

# show the hole tree
print(t.get_ascii(attributes=["name", "tag"], show_internal=False))

# show all monophyletic groups for tag=TAG2
for node in t.get_monophyletic(values=["TAG2"], target_attr="tag"):
    print(node.get_ascii(attributes=["tag", "name"], show_internal=False))


#    /-TAG2, seq_TAG2_Canis_austracus
# --|
#    \-TAG2, seq_TAG2_Canis_taurus
#
#    /-TAG2, seq_TAG2_Canis_lupus
# --|
#   |   /-TAG2, seq_TAG2_Canis_familiaris
#    \-|
#       \-TAG2, seq_TAG2_Canis_canis

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM