I'm dealing with a project with R. The project is, given a big data of edges between nodes, to test whether the test edges are true or not. As in the project, the basic element should be "edges", so that's how we can tell whether a given edge is real or not. So here comes the problem. we've create a data frame of two columns of "from" nodes and "to" nodes to indicate the edges, which is edgesData
then we've created a graph from it using igraph, which is graph
.We can calculate the similarity of two certain nodes, using
similarity.jaccard(graph, vids = V(graph)[edgesData[1,1], edgesData[1,2]])
But how can we get a table of all edges? I've tried
similarity.jaccard(graph, vids = V(graph)[edgesData[,1], edgesData[,2]])
but it didn't work. Also I've tried
similarity.jaccard(graph, vids = E(graph))
it didn't work either. An obvious way is to use loop, to retrieve each row from the data frame, but it seems to be a bad idea. So, Can anyone give me some advice? Thanks!
edit: Ok it seems like the question is a bit confusing, so I've wrote a loop solution for it:
tpData <- edgesData
simList <- c()
while(nrow(tpData) > 0) {
v1 <- tpData[1,1]
v2 <- tpData[1,2]
simList <- c(simList, similarity.jaccard(graph, V(graph)[v1, v2])[1,2])
tpData <- tpData[-1,]
}
In this code I tried to get the two elements [,1], [,2] from each row, then calculate the similarity. Since the number of rows is near 20 million, so it takes forever to finish the job. There's got to be a better way to this. Can someone help me please? Thanks.
Not sure if this is the most efficient way, but I've used it in the past myself. This is a simple example using dplyr:
library(igraph)
g <- graph.ring(5)
data.frame(similarity.jaccard(g)) -> dt
dt
# X1 X2 X3 X4 X5
# 1 1.0000000 0.0000000 0.3333333 0.3333333 0.0000000
# 2 0.0000000 1.0000000 0.0000000 0.3333333 0.3333333
# 3 0.3333333 0.0000000 1.0000000 0.0000000 0.3333333
# 4 0.3333333 0.3333333 0.0000000 1.0000000 0.0000000
# 5 0.0000000 0.3333333 0.3333333 0.0000000 1.0000000
library(dplyr)
data.frame(expand.grid(1:nrow(dt),1:ncol(dt))) %>% # combine all nodes (pairs)
select(node1=Var1, node2=Var2) %>% # rename
group_by(node1,node2) %>% # group in order to get each row separately
do(data.frame(simil = dt[.$node1,.$node2])) %>% # pick the corresponding similarities based on the nodes' pair
ungroup
# node1 node2 simil
# 1 1 1 1.0000000
# 2 1 2 0.0000000
# 3 1 3 0.3333333
# 4 1 4 0.3333333
# 5 1 5 0.0000000
# 6 2 1 0.0000000
# 7 2 2 1.0000000
# 8 2 3 0.0000000
# 9 2 4 0.3333333
# 10 2 5 0.3333333
# .. ... ... ...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.