简体   繁体   中英

How to avoid loops in R?

I'm dealing with a project with R. The project is, given a big data of edges between nodes, to test whether the test edges are true or not. As in the project, the basic element should be "edges", so that's how we can tell whether a given edge is real or not. So here comes the problem. we've create a data frame of two columns of "from" nodes and "to" nodes to indicate the edges, which is edgesData then we've created a graph from it using igraph, which is graph .We can calculate the similarity of two certain nodes, using

similarity.jaccard(graph, vids = V(graph)[edgesData[1,1], edgesData[1,2]])

But how can we get a table of all edges? I've tried

similarity.jaccard(graph, vids = V(graph)[edgesData[,1], edgesData[,2]])

but it didn't work. Also I've tried

similarity.jaccard(graph, vids = E(graph))

it didn't work either. An obvious way is to use loop, to retrieve each row from the data frame, but it seems to be a bad idea. So, Can anyone give me some advice? Thanks!

edit: Ok it seems like the question is a bit confusing, so I've wrote a loop solution for it:

tpData <- edgesData
simList <- c()

while(nrow(tpData) > 0) {
  v1 <- tpData[1,1]
  v2 <- tpData[1,2]
  simList <- c(simList, similarity.jaccard(graph, V(graph)[v1, v2])[1,2])
  tpData <- tpData[-1,]
}

In this code I tried to get the two elements [,1], [,2] from each row, then calculate the similarity. Since the number of rows is near 20 million, so it takes forever to finish the job. There's got to be a better way to this. Can someone help me please? Thanks.

Not sure if this is the most efficient way, but I've used it in the past myself. This is a simple example using dplyr:

    library(igraph)

    g <- graph.ring(5)

    data.frame(similarity.jaccard(g)) -> dt

    dt

#         X1        X2        X3        X4        X5
# 1 1.0000000 0.0000000 0.3333333 0.3333333 0.0000000
# 2 0.0000000 1.0000000 0.0000000 0.3333333 0.3333333
# 3 0.3333333 0.0000000 1.0000000 0.0000000 0.3333333
# 4 0.3333333 0.3333333 0.0000000 1.0000000 0.0000000
# 5 0.0000000 0.3333333 0.3333333 0.0000000 1.0000000

    library(dplyr)

    data.frame(expand.grid(1:nrow(dt),1:ncol(dt))) %>%    # combine all nodes (pairs)
      select(node1=Var1, node2=Var2) %>%                  # rename
      group_by(node1,node2) %>%                           # group in order to get each row separately
      do(data.frame(simil = dt[.$node1,.$node2])) %>%     # pick the corresponding similarities based on the nodes' pair
      ungroup

#    node1 node2     simil
# 1      1     1 1.0000000
# 2      1     2 0.0000000
# 3      1     3 0.3333333
# 4      1     4 0.3333333
# 5      1     5 0.0000000
# 6      2     1 0.0000000
# 7      2     2 1.0000000
# 8      2     3 0.0000000
# 9      2     4 0.3333333
# 10     2     5 0.3333333
# ..   ...   ...       ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM