简体   繁体   English

使 for 循环在 R 中运行得更快

[英]Make a for loop run faster in R

I want to create a model where I duplicate a sentence several times, introducing random error each time.我想创建一个 model,我在其中多次重复一个句子,每次都引入随机错误。 The duplicates of the sentence also get duplicated.句子的重复项也会重复。 So, in cycle one, I start with just "example_sentence".因此,在第一个周期中,我只从“example_sentence”开始。 In cycle two, I have two copies of that sentence.在第二个循环中,我有那句话的两份副本。 In cycle three, I have 4 copies of that sentence.在第三个循环中,我有 4 个那个句子的副本。 I want to do this for 25 cycles with 20k sentences.我想用 20k 个句子做 25 个周期。 The code I wrote to do that works way too slowly, and I am wondering if there is a way to make my nested for loops more efficient?我为此编写的代码运行速度太慢,我想知道是否有办法让我的嵌套 for 循环更高效? Here is the part of the code that is the slowest:这是最慢的代码部分:

    alphabet <- c("a","b","d","j")
    modr1 <- "sentencetoduplicate"

    errorRate <- c()
    errorRate <- append(errorRate, rep(1,1))
    errorRate <- append(errorRate, rep(0,999))

    duplicate <- c(modr1)
    for (q in 1:25) {
      collect <- c()
      for (z in seq_along(duplicate)) {
        modr1 = duplicate[z]
        compile1 <- c()
        for (k in 1:nchar(modr1)) {
          error <- sample(errorRate, 1, replace = TRUE)
          if (error == 1) {
            compile1 <- append(compile1, sub(substring(modr1,k,k),sample(alphabet,1,replace=TRUE),substring(modr1,k,k)))
          } else {
            compile1 <- append(compile1, substring(modr1,k,k))
          }
        }
        modr1 <- paste(compile1, collapse='')
        collect <- append(collect, modr1)
      }
      duplicate <- append(duplicate, collect)
    }

Here is a faster approach to your loop, but I think the problem of applying this to your problem of 20K sentences remains!这是一种更快的循环方法,但我认为将其应用于 20K 句子的问题仍然存在!

f <- function(let, alphabet = c("a","b","c","d","j"),error_rate=1/1000) {
  lenlet=length(let)
  let = unlist(let)
  k <- rbinom(length(let),1,prob = error_rate)
  let[k==1] <- sample(alphabet,size = sum(k==1), replace=T)
  return(as.list(as.data.frame(matrix(let, ncol=lenlet))))
}

modr1 <- "sentencetoduplicate"
k <- data.table(list(strsplit(modr1,"")[[1]]))

for(q in 1:25) {
    k[, V1:=list(f(V1))]
    k <- k[rep(1:nrow(k),2)]
}

Updated with slightly faster version!更新了稍微快一点的版本! (Notice this is no longer by=1:nrow(k) ) (注意这不再by=1:nrow(k)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM