[英]Count number of groups (with specific TAG) within a specific format (with Python)
Hello everyone I need some help :大家好,我需要一些帮助:
I do not know if you are familiar with phylogenetic tree but here is an exemple:我不知道你是否熟悉系统发育树,但这里有一个例子:
/-YP_001604167.1
|
|--YP_001604351.1
--|
| /-seq_TAG2_Canis_taurus
| /-|
| | \-seq_TAG2_Canis_austracus
\-|
| /-YP_001798528.1
\-|
| /-YP_009173671.1
\-|
| /-seq_TAG1_Mus_musculus
\-|
| /-seq_TAG1_Mus_griseus
\-|
| /-seq_TAG2_Canis_canis
\-|
| /-seq_TAG2_Canis_familiaris
\-|
\-seq_TAG2_Canis_lupus
And this tree is coded by a specific format called newick :这棵树由一种称为 newick 的特定格式编码:
'(YP_001604167.1,YP_001604351.1,((seq_TAG2_Canis_austracus,seq_TAG2_Canis_taurus),(YP_001798528.1,(YP_009173671.1,(seq_TAG1_Mus_musculus,(seq_TAG1_Mus_griseus,(seq_TAG2_Canis_lupus,(seq_TAG2_Canis_familiaris,seq_TAG2_Canis_canis))))))));'
The tree ends with a semicolon.树以分号结尾。 The bottommost node in this tree is an interior node, not a tip.这棵树中最底部的节点是内部节点,而不是尖端。 Interior nodes are represented by a pair of matched parentheses.内部节点由一对匹配的括号表示。 Between them are representations of the nodes ( seq_names
) that are immediately descended
from that node
, separated by commas
.它们之间是节点的表示( seq_names
),这些node
直接从该node
descended
,用commas
。
son if I have something like :儿子,如果我有类似的东西:
(A,(B,C));
Then it means that B
and C
are more closely related each other and A
is the most distant.那么这意味着B
和C
彼此之间的关系更密切, A
是最远的。
And the idea of my question was to find a way using for instance python to count the number of groups with the same " TAG_number
" that are more close to each other than any other TAG_number
or YP_number
nodes.我的问题的想法是找到一种方法,例如使用 python 来计算具有相同“ TAG_number
”的组的数量,这些组比任何其他TAG_number
或YP_number
节点更接近彼此。
For instance, the TAG2
in representated in 2 groups
where (seq_TAG2_Canis_taurus, seq_TAG2_Canis_austracus)
are together and the second group (seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus))
are together.例如, TAG2
在2 groups
中表示,其中(seq_TAG2_Canis_taurus, seq_TAG2_Canis_austracus)
在一起,第二组(seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus))
在一起。 For the TAG1
as you can see, none of them is nested together because seq_TAG1_Mus_griseus
is more close to the group (seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus))
than it is with the other TAG1 seq_TAG1_Mus_musculus
.对于TAG1
,你可以看到,他们都不是嵌套在一起,因为seq_TAG1_Mus_griseus
更接近组(seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus))
比它与其他TAG1 seq_TAG1_Mus_musculus
。
So the result should be something like :所以结果应该是这样的:
groups for TAG_1 : 0
groups for TAG_2 : 2
I know that some packages in Python or R are available in order to tell if TAG_number are in " monophyletic groups
" but there is nothing to tells the number of groups within the tree if TAG_number
groups are splitted within the tree.我知道可以使用 Python 或 R 中的一些包来判断 TAG_number 是否在“ monophyletic groups
”中,但是如果TAG_number
组在树中被TAG_number
,则没有什么可以告诉树中的组数。
If you have any idea in order to do that?如果您有任何想法可以做到这一点? Thank you very much.非常感谢。
Other part of the question :问题的其他部分:
Now I have a Species phylogeny
such as :现在我有一个Species phylogeny
例如:
| /-Canis_taurus
| /-|
| | \-Canis_astracus
| /-|
| | | /-Canis_africus
| | \-|
| | | /-Canis_familiaris
\-| \-|
| \-Canis _lupus
|
| /-Canis_canis
\-|
\-Lupus_lupus
and The idea is within each monophyletic groups
assesed in the previous process, to count within clades formed by the MRCA of the clades in the species phylogeny the number of nodes.该想法是在先前过程中评估的每个monophyletic groups
内,以计算由物种系统发育中进化枝的 MRCA 形成的进化枝内的节点数。
So I have 2 groups
:所以我有2 groups
:
The first:首先:
# /-TAG2, seq_TAG2_Canis_austracus
# --|
# \-TAG2, seq_TAG2_Canis_taurus
#
Here Canis_austracus
and Canis_taurus
share a MRCA
in the species phylogeny and this ancestor forms the clade composed by 2 species
( Canis_austracus and Canis_taurus
)这里Canis_austracus
和Canis_taurus
在物种系统发育中共享一个MRCA
,这个祖先形成了由2 species
( Canis_austracus and Canis_taurus
)组成的进化枝
So Nb species within species phylogenetic tree = 2
所以物种系统发育树中的 Nb 物种 = 2
# /-TAG2, seq_TAG2_Canis_lupus
# --|
# | /-TAG2, seq_TAG2_Canis_familiaris
# \-|
# \-TAG2, seq_TAG2_Canis_canis
Here the 3 taxa share a MRCA
and this ancestor forms the clade composed by all species in the species phylogeny (7)这里的 3 个分类群共享一个MRCA
,这个祖先形成了物种系统发育中所有物种组成的进化枝 (7)
So Nb species within species phylogenetic tree = 7
所以物种系统发育树中的 Nb 物种 = 7
Maybe get_monophyletic of ete3 is what you need?也许 ete3 的 get_monophyletic 是你所需要的? http://etetoolkit.org/docs/latest/reference/reference_tree.html?highlight=get_monophyletic#ete3.TreeNode.get_monophyletic http://etetoolkit.org/docs/latest/reference/reference_tree.html?highlight=get_monophyletic#ete3.TreeNode.get_monophyletic
from ete3 import Tree import re从 ete3 导入树导入重新
# build tree
t = Tree("(YP_001604167.1,YP_001604351.1,"
"((seq_TAG2_Canis_austracus,seq_TAG2_Canis_taurus),"
"(YP_001798528.1,(YP_009173671.1,(seq_TAG1_Mus_musculus,"
"(seq_TAG1_Mus_griseus,(seq_TAG2_Canis_lupus,"
"(seq_TAG2_Canis_familiaris,seq_TAG2_Canis_canis))))))));")
# set tag as leave attribute
for leaf in t:
# get tag from name
tag = re.search('TAG[0-9]', leaf.name)
tag = tag.group(0) if tag else None
leaf.add_features(tag=tag)
# show the hole tree
print(t.get_ascii(attributes=["name", "tag"], show_internal=False))
# show all monophyletic groups for tag=TAG2
for node in t.get_monophyletic(values=["TAG2"], target_attr="tag"):
print(node.get_ascii(attributes=["tag", "name"], show_internal=False))
# /-TAG2, seq_TAG2_Canis_austracus
# --|
# \-TAG2, seq_TAG2_Canis_taurus
#
# /-TAG2, seq_TAG2_Canis_lupus
# --|
# | /-TAG2, seq_TAG2_Canis_familiaris
# \-|
# \-TAG2, seq_TAG2_Canis_canis
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.