I am working with a graph object (igraph package) in R. I apply a function called" get.shortest.paths()" that provides the shortest paths from a given vertex to all the other vertices in the graph. The algorithm returns a list, where each element of the list corresponds to a target vertex, and contains the vertex indices of all the vertices on the shortest path between the source and the target. For example;
head(get.shortest.paths(graph, from = V(graph)[1], to = V(graph), mode = "out"))
[[1]]
[1] 0 (source and target are the same)
[[2]]
[1] 0 91835 38405 89704 1
[[3]]
[1] 0 91835 12104 39002 22670 2
[[4]]
[1] 0 62386 36754 89246 31045 3
The problem is when I want to go from vertex indices to vertex names. Something like this;
[[1]]
[1] "gene 1"
[[2]]
[1] "gene 1" "protein 45" "protein 83" "protein 70" "gene 2"
[[3]]
[1] "gene 1" "protein 45" "protein 30" "reaction 2" "protein 404" "gene 3"
[[4]]
[1] "gene 1" "protein 4" "reaction 12" "protein 19" "protein 494" "gene 4"
I try to do this by using lapply()
path.index.list <- get.shortest.paths(graph, from = V(graph)[1], to = V(cn), mode = "out")
path.name.list <- lapply(path.index.list, FUN = function(path) V(graph)[path]$name)
... but this takes a very long time. "For" loops take just as long. In fact, the exact time I needed to covert from indices to names for just one source vertex to all other 100,000+ vertices was...
system.time(lapply(path.index.list, FUN = function(path) V(graph)[path]$name))
user system elapsed
608.62 152.69 761.66
... which comes to about 900 days for the whole graph.
Is this one of those a "pass-by-reference" vs "pass-by-value" problems and if so can someone help me understand how to solve it? I have heard of using hashes or environment functions in R to solve things like this, can anyone comment on that? I have also heard of some packages in R that can help address this?
Basically, how can I solve this without having to code in C?
Query the names of the vertices in advance and index that in lapply
:
names <- V(graph)$name
lapply(path.index.list, FUN = function(path) names[path])
I guess this is going to be much faster because lapply
won't have to build V(graph)
and the name list every time just to select a sublist of it.
Yes, I originally used the lapply method described by use "Tamás". I am getting about 230 seconds per iteration (about 2 seconds per 1000 items). I tried using the "fastmatch" package combined with memory allocation using matrices and speed actually went down. I took this to mean this was more an issue with how fast R looks up items then memory. I need to get this down to less than 6 seconds per iteration for this actually to be practical. I guess I'm going to C...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.