简体   繁体   中英

(slow edge coloring) how to avoid looping all tip.labels (package ape)

I am coloring edges in phylo trees generated with some functions from "ape"

Since I've always programmed in C, I still find it hard to stop thinking in a loop-like manner.

The only way I can think of doing this is by (1) looping all the tip.labels (IDs), (2) finding out which edges belong to them and (3) setting the desired color.

This is done 1 by 1 and therefore is incredibly slow for big trees:

tsampltime.rooted=structure(list(edge = structure(c(24L, 24L, 24L, 24L, 24L, 25L, 
26L, 26L, 27L, 27L, 28L, 28L, 25L, 29L, 29L, 30L, 30L, 30L, 30L, 
24L, 31L, 31L, 32L, 32L, 32L, 33L, 33L, 34L, 35L, 35L, 34L, 36L, 
36L, 34L, 37L, 37L, 1L, 2L, 12L, 23L, 25L, 26L, 6L, 27L, 5L, 
28L, 3L, 4L, 29L, 7L, 30L, 8L, 9L, 10L, 11L, 31L, 13L, 32L, 21L, 
22L, 33L, 20L, 34L, 35L, 14L, 15L, 36L, 16L, 17L, 37L, 18L, 19L
), .Dim = c(36L, 2L)), Nnode = 14L, tip.label = c("0", "2325", 
"55304", "124953", "72254", "66507", "85089", "110256", "123265", 
"97350", "123721", "36770", "48692", "110612", "97224", "104337", 
"124625", "128499", "120928", "88404", "73335", "75059", "17928"
), edge.length = c(0, 0.953297, 8.054944, 4.4120893, 9.173083, 
1.409346, 3.752752, 0.483517, 4.620875, 0.582417, 0.510989, 12.4862723, 
6.291209, 1.920329, 3.071429, 4.5027528, 5.497248, 2.777472, 
5.5274749, 8.414843, 2.5467017, 3.79121, 3.824171, 3.961538, 
3.804944, 2.126375, 1.75275, 1.93956, 3.3516546, 1.57418, 2.31319, 
2.22528, 4.0384651, 3.898348, 2.722523, 1.87088)), .Names = c("edge", 
"Nnode", "tip.label", "edge.length"), class = "phylo", order = "cladewise")
   ... 
#distValuesPerId[,] has [LABELID,COLOR]
distValuesPerId=source('http://ubuntuone.com/5y7ZYCWfE73T5lhnUpmeXc')
...
uniqueIDs=unique(tree$tip.label)
distTrdsampledcol <-rep("black", length(tree$edge)) #init in black
for(i in uniqueIDs) { #(1)
    a= c(which(tree$tip.label==i)) 
    b= which(tree$edge[,2]== a) #(2)
    distTrdsampledcol [ b ] <- distValuesPerId[i,2] #(3)
}
...
#plot(tree, edge.color=distTrdsampledcol)

Can anyone help me to rethink this? Is there a more efficient of doing this?

在此输入图像描述

Thanks in advance!

j

You might be overthinking this. Just subselect the colours you need from your giant colour data.frame .

plot(tree,edge.color=distValuesPerId[tree$tip.label,2])

Try out the examples list on ?plot.phylo , they have a ton of examples of really cool things you can do with your trees, including colouring. It might give you some ideas.


After seeing your comment, I realized, that I had misunderstood the question. This should do what you want without loops:

cols=distValuesPerId[match(tree$tip.label[tree$edge[,2]],distValuesPerId[,1]),2]
my.cols=ifelse(is.na(cols),'black',cols)
plot(tree, edge.color=my.cols)

Breaking it down:

# Find the tip labels associated with each edge, NA if it is not an edge to a tip
edge.tip.labels=tree$tip.label[tree$edge[,2]]
# Match each of those tip labels to the label column in your colur data frame
edge.rows=match(edge.tip.labels,distValuesPerId[,1])
# Find the colour for each of those rows
cols=distValuesPerId[edge.rows,2]
# Where it is NA, convert it to 'black' (where it is not a 'tip edge')
my.cols=ifelse(is.na(cols),'black',cols)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM