简体   繁体   中英

Making a nested for loop run faster in R

I have the following code (nested for loop) in R which is extremely slow. The loop matches values from two columns. Then picks up a corresponding file and iterates through the file to find a match. Then it picks up that row from the file. The iterations could go up to more than 100,000. Please if some one can provide an insight on how to quicken the process.

for(i in 1: length(Jaspar_ids_in_Network)) {
  m <- Jaspar_ids_in_Network[i]
  gene_ids <- as.character(GeneTFS$GeneIds[i])
  gene_names <- as.character(GeneTFS$Genes[i])

  print("i")
  print(i)

  for(j in 1: length(Jaspar_ids_in_Exp)) {
    l <- Jaspar_ids_in_Exp[j]
    print("j")
    print(j)

    if (m == l) {
      check <- as.matrix(read.csv(file=paste0(dirpath,listoffiles[j]),sep=",",header=FALSE))
      data_check <- data.frame(check)
      for(k in 1: nrow(data_check)) {
        gene_ids_JF <- as.character(data_check[k,3])
        genenames_JF <- as.character(data_check[k,4])

        if(gene_ids_JF == gene_ids) {
          GeneTFS$Source[i] <- as.character(data_check[k,3])
          data1 <- rbind(data1, cbind(as.character(data_check[k,3]),  
                                      as.character(data_check[k,8]), 
                                      as.character(data_check[k,9]),  
                                      as.character(data_check[k,6]), 
                                      as.character(data_check[k,7]),  
                                      as.character(data_check[k,5])))
        } else if (toupper(genenames_JF) == toupper(gene_names)) { 
          GeneTFS$Source[i] <- as.character(data_check[k,4])
          data1 <- rbind(data1, cbind(as.character(data_check[k,4]),
                                      as.character(data_check[k,5]), 
                                      as.character(data_check[k,6]), 
                                      as.character(data_check[k,7]),
                                      as.character(data_check[k,8]),
                                      as.character(data_check[k,2])))
        } else {
         # GeneTFS[i,4] <- "No Evidence"    
        }
      }
    } else {
      # GeneTFS[i,4] <- "Record Not Found"          
    }
  }  
}

如果拉出将一对处理为函数f(m,l)的逻辑,则可以将double循环替换为:

outer(Jaspar_ids_in_Network, Jaspar_ids_in_Exp, Vectorize(f))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM